implement aho corasick in swift

To implement Aho-Corasick algorithm in Swift, we can follow these steps:

  1. Build a Trie data structure from the patterns/text that need to be searched.

  2. Add a failure function to each node such that it points to the largest possible suffix that is present in the Trie.

  3. Add a success function that points to the last node of the pattern string.

  4. Traverse the input string and follow the failure and success functions of each node in the Trie until we find the match.

Below is the Swift implementation of Aho-Corasick algorithm:

main.swift
class TrieNode {
    var value: Character?
    weak var parent: TrieNode?
    var children: [Character: TrieNode] = [:]
    var isTerminating = false
    var patterns: [String] = []

    var failure: TrieNode?
    var output: [String] = []

    init(value: Character?, parent: TrieNode?) {
        self.value = value
        self.parent = parent
    }

    func add(pattern: String) {
        var currentNode = self
        for character in pattern {
            if let childNode = currentNode.children[character] {
                currentNode = childNode
            } else {
                let newNode = TrieNode(value: character, parent: currentNode)
                currentNode.children[character] = newNode
                currentNode = newNode
            }
        }
        currentNode.isTerminating = true
        currentNode.patterns.append(pattern)
    }
}

class AhoCorasick {
    var rootNode = TrieNode(value: nil, parent: nil)

    func build() {
        var nodeQueue = [rootNode]

        while !nodeQueue.isEmpty {
            let currentNode = nodeQueue.removeFirst()

            for (_, childNode) in currentNode.children {
                var parentNode = currentNode.failure

                while parentNode != nil && parentNode!.children[childNode.value!] == nil {
                    parentNode = parentNode!.failure
                }

                childNode.failure = parentNode!.children[childNode.value!] ?? rootNode
                childNode.output = childNode.failure!.isTerminating
                    ? childNode.failure!.patterns + childNode.failure!.output
                    : childNode.failure!.output

                nodeQueue.append(childNode)
            }
        }
    }

    func search(text: String) -> [String: [Int]] {
        var currentNode = rootNode
        var result: [String: [Int]] = [:]

        for (j, character) in text.enumerated() {
            while currentNode != nil && currentNode!.children[character] == nil {
                currentNode = currentNode!.failure
            }

            if currentNode == nil {
                currentNode = rootNode
                continue
            }

            currentNode = currentNode!.children[character]!
            for pattern in currentNode!.output {
                if result[pattern] == nil {
                    result[pattern] = []
                }
                result[pattern]?.append(j - pattern.count + 1)
            }
        }

        return result
    }
}
2492 chars
84 lines

Here, the TrieNode class represents a node in the Trie, which has additional properties failure and output. The failure property represents the largest possible suffix that is present in the Trie, and the output property represents all the patterns that can be rooted at this node.

The AhoCorasick class represents the Aho-Corasick algorithm implementation, which builds Trie data structure from the patterns, adds failure and output functions to each node, and searches the input text using the Trie.

To use the above implementation, we can create a new instance of AhoCorasick, add patterns using add method, build the Trie using build method, and finally search for the patterns in the input text using search method. For example:

main.swift
let ahoCorasick = AhoCorasick()
ahoCorasick.rootNode.add(pattern: "he")
ahoCorasick.rootNode.add(pattern: "she")
ahoCorasick.rootNode.add(pattern: "hers")
ahoCorasick.rootNode.add(pattern: "his")

ahoCorasick.build()

let text = "ushers"
let result = ahoCorasick.search(text: text)

print(result) // prints ["he": [1], "she": [1], "hers": [2], "his": [2]]
356 chars
13 lines

gistlibby LogSnag