This article is rated Stub-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||||||||||||||||||||
|
I'm not convinced that this article addresses the "what" question. The last sentence of the section "FM-index data structure" has:
This really needs expansion. It seems strang to talk about the LF mapping in so much detail and then not describe how the index is structured. gringer ( talk) 06:14, 16 June 2013 (UTC)
The article is confusing. For example it says: "For any row i of the matrix, the character in the last column L[i] precedes the character in the first column F[i] also in T." What does it mean? It is clearly not true: If we let i be the first row of the matrix, then L[i] is the last letter in the string T, so it must come after everything else in T. Also if i represents the first row, F[i] must be the letter in T which comes first in alphabetic ordering ($ in the example). So no other letter can come before it.
Perhaps after "the matrix" we should insert "M". Perhaps before "T" we should insert "the original string". 128.16.7.220 ( talk) 15:47, 7 June 2014 (UTC) Bill
T = "abracadabra$"
, that L[i] is 'a' ans F[i] is '$'.
ExplodingCabbage (
talk)
13:06, 16 March 2022 (UTC)Surely suffix "bra" ends with a. It starts with b. 128.16.7.220 ( talk) 16:05, 7 June 2014 (UTC) Bill
Could someone else check the example in the "Count" section. In most places in the article the indexes start at one. Assuming that, the start and end values given in the example appear to be out by one in most cases. 128.16.7.220 ( talk) 16:58, 7 June 2014 (UTC) Bill
Existing applications of FM-index use it for looking up strings, rather than counting how many times the string occurs. Hence it is more important that the "Locate" section makes sense than the "Count" section. Therefore would it be possible to add more explaination and/or examples to the "Locate" section.
The existing text says "For instance locate(7) = 8" appears to be wrong. "locate(7)" appears to mean the 7th character in L, which is "c". "c" occurs only once in "abracadabra" and that is position 5 not position 7.
128.16.7.220 ( talk) 17:29, 7 June 2014 (UTC) Bill
"locate(7)" appears to mean the 7th character in L, which is "c"
The "Locate" section uses the formulae O(p + occ logε u) and without ever defining ε.
ExplodingCabbage ( talk) 14:28, 16 March 2022 (UTC)
The "locate" section uses the formula without ever defining , which as far as I can spot isn't defined anywhere else in the article either.
ExplodingCabbage ( talk) 14:28, 16 March 2022 (UTC)
The function Occ(c, k) is a necessary part of both the "count" and "locate" algorithms yet is skipped over with no explanation. Indeed, the article makes it sound rather magical - apparently "it is possible to compute Occ(c, k) in constant time" despite the fact that the obvious dumb approach to implementing Occ (iterate over the first k characters and count the ones lexically smaller than c) of course takes O(k) time. Nowhere are we told what algorithm is used to achieve this magic, nor what data structures within the FM-index it uses.
This article is rated Stub-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||||||||||||||||||||
|
I'm not convinced that this article addresses the "what" question. The last sentence of the section "FM-index data structure" has:
This really needs expansion. It seems strang to talk about the LF mapping in so much detail and then not describe how the index is structured. gringer ( talk) 06:14, 16 June 2013 (UTC)
The article is confusing. For example it says: "For any row i of the matrix, the character in the last column L[i] precedes the character in the first column F[i] also in T." What does it mean? It is clearly not true: If we let i be the first row of the matrix, then L[i] is the last letter in the string T, so it must come after everything else in T. Also if i represents the first row, F[i] must be the letter in T which comes first in alphabetic ordering ($ in the example). So no other letter can come before it.
Perhaps after "the matrix" we should insert "M". Perhaps before "T" we should insert "the original string". 128.16.7.220 ( talk) 15:47, 7 June 2014 (UTC) Bill
T = "abracadabra$"
, that L[i] is 'a' ans F[i] is '$'.
ExplodingCabbage (
talk)
13:06, 16 March 2022 (UTC)Surely suffix "bra" ends with a. It starts with b. 128.16.7.220 ( talk) 16:05, 7 June 2014 (UTC) Bill
Could someone else check the example in the "Count" section. In most places in the article the indexes start at one. Assuming that, the start and end values given in the example appear to be out by one in most cases. 128.16.7.220 ( talk) 16:58, 7 June 2014 (UTC) Bill
Existing applications of FM-index use it for looking up strings, rather than counting how many times the string occurs. Hence it is more important that the "Locate" section makes sense than the "Count" section. Therefore would it be possible to add more explaination and/or examples to the "Locate" section.
The existing text says "For instance locate(7) = 8" appears to be wrong. "locate(7)" appears to mean the 7th character in L, which is "c". "c" occurs only once in "abracadabra" and that is position 5 not position 7.
128.16.7.220 ( talk) 17:29, 7 June 2014 (UTC) Bill
"locate(7)" appears to mean the 7th character in L, which is "c"
The "Locate" section uses the formulae O(p + occ logε u) and without ever defining ε.
ExplodingCabbage ( talk) 14:28, 16 March 2022 (UTC)
The "locate" section uses the formula without ever defining , which as far as I can spot isn't defined anywhere else in the article either.
ExplodingCabbage ( talk) 14:28, 16 March 2022 (UTC)
The function Occ(c, k) is a necessary part of both the "count" and "locate" algorithms yet is skipped over with no explanation. Indeed, the article makes it sound rather magical - apparently "it is possible to compute Occ(c, k) in constant time" despite the fact that the obvious dumb approach to implementing Occ (iterate over the first k characters and count the ones lexically smaller than c) of course takes O(k) time. Nowhere are we told what algorithm is used to achieve this magic, nor what data structures within the FM-index it uses.