. : match arbitrary 1 character
*: match any length of strings.
I spent some time in this algorithm, and extended the KMP to support them without increase the time complexity: it's still O(n)
vector<int> next_ext(const string &str)
{
int leng = str.size();
if (leng < 0 || str.empty())
return vector<int>();
vector<int> next(leng + 1);
int i = -1, j, off; //treated like str[-1]
is a '*'
off = i, j = off, i ++, next[i] = off;
while (i < leng)
{
if (str[i] == '*' ) //consistent with the above comment.
off = i, j = off, i ++, next[i] = off;
else if (j == off || str[i] == str[j] || str[j] == '.')
++i, ++j, next[i] = j;
else
j = next[j];
}
return next;
}
//if matched, return the last position of the first matched
substring of the string, else return -1.
int kmp_ext(string str, string pattern)
{
int str_len = str.size(), pattern_len = pattern.size();
vector<int> next = next_ext(pattern);
int i = 0, j = -1, off; // treated like
pattern[-1] is a '*'
off = j, j ++;
while(i < str_len && j < pattern_len)
{
if (j != off && pattern[j] == '*' )
off = j, j ++; //consistent with the
above comment
else if( j == off || str[i] == pattern[j] || pattern[j] == '.')
j++, i++;
else
j = next[j];
}
if(j == pattern_len)
return i;
else
return -1;
}
int main()
{
string s = "abdcxdefabc";
cout<<kmp_ext(s, "ab..cx")<<endl;
cout<<kmp_ext(s, "ab*x.*fa*c")<<endl;
cout<<kmp_ext(s, "cxd*")<<endl;
cout<<kmp_ext(s, "*b*x.*fa*c")<<endl;
cout<<kmp_ext(".ab", ".a.")<<endl;
cout<<kmp_ext("a*b", "a*b")<<endl;
}
1. Add one more condition in 'if' statement indicating pattern[j] == '.' is considered as a vaild match for any character.
2. Add one more 'if' statement to generalize the standard KMP to handle '*'.
3. Introduced an additional variable 'off', whcih is initialized with -1, but changes when a '*' is met. In standard KMP, it is set to be a constant: -1. Here I generalized it to a variable.
These 3 differences occur both in next() and kmp() function, symmetrically.
Except for these 3 differences (in each function), all other part is standard KMP.
How to extend it to solve a matching problem?
KMP is a "find" method. Generally speaking, "find" is usually harder than "match".
So, with slight modification, this method can be used to solve the "match" too-- just also return the beginning position of the first matched substring of the string. If the found substring turns out to cover the whole string, it is a match.
I think your program is incorrect. When the input is:
ReplyDeletes=abaab
p=*?a?
The expected answer is 4 but your ouput is -1.
You have to replace "?" to ".", bitch. :)
ReplyDeleteThis comment has been removed by the author.
ReplyDeletetry this case:
ReplyDelete`cout<<kmp_ext(".ab", "baabbaa")<<endl;`
the result is `-1` on my computer, but apparently it's not correct