Pagini recente » Diferente pentru trie-forest-based-sort intre reviziile 18 si 19 | Interact | Diferente pentru trie-forest-based-sort intre reviziile 22 si 6 | Diferente pentru blog/deep-learning-what-was-wrong-in-1986 intre reviziile 47 si 45 | Diferente pentru blog/rolling-hash intre reviziile 22 si 24

Diferente pentru blog/rolling-hash intre reviziile #22 si #24

Diferente intre #21 si #22

Nu exista diferente intre titluri.

Diferente intre continut:

bq. Given a string P of length n and a string S of length m find out all the occurrences of P within S

The algorithm works by building a fingerprint for each subsequence of S of length n. It checks if any of them match the fingerprint of P. Implementing this directly leads to a $O(n*m)$ solution which isn't faster than the naive matching algorithm (in fact it may be slower).

The algorithm works by building a fingerprint for each substring of S of length n. It checks if any of them match the fingerprint of P. Implementing this directly leads to a $O(n*m)$ solution which isn't faster than the naive matching algorithm (in fact it may be slower).

*The insight* is that you can efficiently compute the hash value of a subsequence using the hash value of previous subsequence if you use the right hash function. A polynomial with the string characters as coefficients works well. Let <tex>H(c)=c[0]a^{n-1} + c[1]a^{n-2}+...+c[n-1]</tex> be the hash function for the string c.

*The insight* is that you can efficiently compute the hash value of a substring using the hash value of previous substring if you use the right hash function. A polynomial with the string characters as coefficients works well. Let <tex>H(c)=c[0]a^{n-1} + c[1]a^{n-2}+...+c[n-1]</tex> be the hash function for the string c.

All the operations are done modulo a prime number so that we don’t have to deal with large integers.
Let’s see what happens when we go from $H(S[i..i+n-1])$ to $H(S[i+1..i+n])$:

Let’s choose <tex>H(S[i..j]) = S[i]+S[i+1]a+S[i+2]a{2}+...+S[j]a^{j-i} </tex> Then we compute the array h which contains cummulative hashes: <tex>h[k]=S[0]+S[1]a+S[2]a^{2}+...+S[k]a^{k}</tex>. It’s easy to see that <tex>H(S[i..j])=(h[j]-h[i-1])/a^i</tex>

After spending O(n) time on preprocessing h, we have an efficient way to compute the hash code for any subsequence of S.

After spending O(n) time on preprocessing h, we have an efficient way to compute the hash code for any substring of S.

The division operation modulo a prime needs some number theory insight. You can do it using Fermat's little theorem or using the Extended Euclidean Algorithm. If you need to process a lot of sub strings it's useful to pre process all the values for <tex>a^{-i} \bmod{p}</tex>.

Diferente intre securitate:

public

protected

infoarena informatica de performanta

Diferente pentru blog/rolling-hash intre reviziile #22 si #24

Nu exista diferente intre titluri.

Diferente intre continut:

Diferente intre securitate:

Topicul de forum nu a fost schimbat.