Pagini recente » s2c | Atasamentele paginii Bitmap | Atasamentele paginii Basequery | infoarena - Blog | Diferente pentru blog/rolling-hash intre reviziile 4 si 5

Diferente pentru blog/rolling-hash intre reviziile #4 si #5

Diferente intre #5 si #6

Diferente intre #3 si #4

Nu exista diferente intre titluri.

Diferente intre continut:

h2. The Rabin-Karp algorithm

This algorithm is a textbook example of the rolling hash trick. Rabin-Karp solves the following problem:

bq. Given a string P of length n and a string S of length m find out all the occurrences of P within S

The algorithm works by building a fingerprint for each subsequence of S of length n. It checks if any of them match the fingerprint of P. Implementing this directly leads to a O(n*m) solution which isn't faster than the naive matching algorithm (in fact it might be even slower).
 
*The insight* is that you can efficiently compute the hash value of a subsequence using the hash value of previous subsequence. One can use the first character of the previous sequence, the new character of the current sequence and the previous fingerprint without needing to go through the other characters.

The algorithm works by building a fingerprint for each subsequence of S of length n. It checks if any of them match the fingerprint of P. Implementing this directly leads to a $O(n*m)$ solution which isn't faster than the naive matching algorithm (in fact it might be even slower).

A polynomial with the string characters as coefficients works well.
Lets chose <tex>H(c)=c[0]a^{n-1} + c[1]a^{n-2}+...+c[n-1]</tex> as the hash function for the string c.

*The insight* is that you can efficiently compute the hash value of a subsequence using the hash value of previous subsequence if you use the right hash function. A polynomial with the string characters as coefficients works well. Let <tex>H(c)=c[0]a^{n-1} + c[1]a^{n-2}+...+c[n-1]</tex> be the hash function for the string c.

All the operations are done modulo a prime number so that we don’t have to deal with large integers.
Let’s see what happens when we go from H(S[i..i+n-1]) to H(S[i+1..i+n]):

infoarena informatica de performanta

Diferente pentru blog/rolling-hash intre reviziile #4 si #5

Nu exista diferente intre titluri.

Diferente intre continut:

Nu exista diferente intre securitate.

Topicul de forum nu a fost schimbat.