Questions tagged [suffix-array]

A suffix array is a data structure that represents the lexicographically sorted list of all suffixes of a string (in the computer-science, not the linguistics, sense of the word suffix). It is the basis for many high-performance algorithms performed on very large strings, for example full-text search or compression.

A suffix array is a data structure that represents the lexicographically sorted list of all suffixes of a string. It is the basis for many high-performance algorithms performed on very large strings, for example full-text search or compression.

Formal definitions

String: A string is an ordered sequence of symbols, each taken from a pre-defined, finite set. That set is called alphabet, or character set. The symbols are often referred to as characters.

Suffix: Given a string T of length n, a suffix of T is defined as a substring that starts at any position of T and ends at position n (the end of T).

Example: Let T:=abc, then abc,bc and c are suffixes of T, but a and ab are not.

Remark: Any string T of length n has exactly n distinct suffixes (as many as there are characters in it), because any character is the beginning of exactly one suffix.

Suffix array: Given a string T of length n, and a linear ordering on the alphabet, the suffix array of T is the lexicographically sorted list of all suffixes of T.

Example: Let T:=abcabx and assume the 'natural' alphabetic ordering, i.e. a < b < c < d... < x < y < z. Then the suffix array of T is as follows.

abcabx
abx
bcabx
bx
cabx
x

Implementation

The suffix array is usually not explicitly stored in memory. Instead it is represented as a list of integers, each representing the starting position of a suffix.

abcabx 012345

Example: Given T as defined above, and assume a numbering of its positions from 0 to 5, the suffix array is represented as the list [0,3,1,4,2,5].

The `suffix-array` tag

Many of the questions tagged suffix-array are related to one of the topics below.

How to construct suffix arrays efficiently
How to store, and possibly compress, them efficiently
How to make use of them for various purposes, such as full-text search, detection of regularities in strings and text-compression
How they are used in various fields of application, in particular bioinformatics, genetics and natural language processing
What existing and/or ready-to-use implementations of any of the above are known
Worst-case, average-case and empirical comparisons of time and space requirements of existing algorithms and implementation

145 questions

votes

1 answer

Suffix Array Algorithm

After quite a bit of reading, I have figured out what a suffix array and LCP array represents. Suffix array: Represents the _lexicographic rank of each suffix of an array. LCP array : Contains the maximum length prefix match between two consecutive…

asked Jul 20 '13 at 11:21

Spandan

2,058
4
21
35

votes

3 answers

What's the current state-of-the-art suffix array construction algorithm?

I'm looking for a fast suffix-array construction algorithm. I'm more interested in ease of implementation and raw speed than asymptotic complexity (I know that a suffix array can be constructed by means of a suffix tree in O(n) time, but that takes…

suffix-array

asked Oct 22 '11 at 05:43

Cameron

86,330
19
177
216

votes

2 answers

How does LCP help in finding the number of occurrences of a pattern?

I have read that the Longest Common Prefix (LCP) could be used to find the number of occurrences of a pattern in a string. Specifically, you just need to create the suffix array of the text, sort it, and then instead of doing binary search to find…

java algorithm data-structures pattern-matching suffix-array

asked Jul 07 '12 at 08:18

Cratylus

49,824
60
195
327

votes

2 answers

Suffix Arrays vs Suffix Trees

I just wanna know, when a suffix tree is superior to an enhanced suffix array. After reading Replacing sufﬁx trees with enhanced sufﬁx arrays i don't see a reason to use suffix trees anymore. Some methods can get complicated, but you can do…

algorithm time-complexity suffix-tree space-complexity suffix-array

asked Jun 25 '12 at 17:14

Nicolas

votes

7 answers

Finding the longest repeated substring

What would be the best approach (performance-wise) in solving this problem? I was recommended to use suffix trees. Is this the best approach?

algorithm pattern-recognition suffix-tree suffix-array

asked Apr 27 '12 at 17:20

kukit

votes

8 answers

Longest Non-Overlapping Repeated Substring using Suffix Tree/Array (Algorithm Only)

I need to find the longest non-overlapping repeated substring in a String. I have the suffix tree and suffix array of the string available. When overlapping is allowed, the answer is trivial (deepest parent node in suffix tree). For example for…

algorithm substring suffix-tree suffix-array

asked Sep 30 '12 at 03:57

Genuine Programmer

votes

2 answers

How to sort array suffixes in block sorting

I'm reading the block sort algorithm from the Burrows and Wheeler paper. This a step of the algorithm: Suppose S= abracadabra Initialize an array W of N words W[0, ... , N - 1], such that W[i] contains the characters S'[i, ... , i + k - 1] arranged…

algorithm sorting suffix-array burrows-wheeler-transform

asked Jun 14 '11 at 23:45

erandros

1,645
3
17
42

votes

3 answers

Suffix array nlogn creation

I have been learning suffix arrays creation, & i understand that We first sort all suffixes according to first character, then according to first 2 characters, then first 4 characters and so on while the number of characters to be considered is…

java algorithm suffix-array suffix

asked Jul 17 '16 at 07:48

user2565192

votes

1 answer

Errata in the original paper on suffix arrays?

I'm looking at the pseudo-code given in figure 3 of the original paper introducing suffix arrays "SUFFIX ARRAYS: A NEW METHOD FOR ON-LINE STRING SEARCHES". I can't figure out the logic for lines 4 and 5 (indexing from 0). The lines reads: else if…

algorithm pseudocode suffix-array

asked Oct 09 '14 at 16:04

nomad

1,800
1
17
33

votes

4 answers

Effcient way to find longest duplicate string for Python (From Programming Pearls)

From Section 15.2 of Programming Pearls The C codes can be viewed here: http://www.cs.bell-labs.com/cm/cs/pearls/longdup.c When I implement it in Python using suffix-array: example = open("iliad10.txt").read() def comlen(p, q): i = 0 for x…

python c suffix-tree suffix-array programming-pearls

asked Nov 26 '12 at 06:57

Hanfei Sun

39,245
33
107
208

votes

1 answer

Shortest uncommon substring: shortest substring of one string, that is not a substring of another string

We need to find shortest uncommon substring between two strings i.e. if we have two strings a and b so we need to find the length of shortest substring of a that is not a substring of b. How to solve this problem using suffix array ? To be solved…

string algorithm suffix-array

asked Sep 26 '12 at 17:49

Aman Singhal

votes

4 answers

strcmp for python or how to sort substrings efficiently (without copy) when building a suffix array

Here's a very simple way to build an suffix array from a string in python: def sort_offsets(a, b): return cmp(content[a:], content[b:]) content = "foobar baz foo" suffix_array.sort(cmp=sort_offsets) print suffix_array [6, 10, 4, 8, 3, 7, 11, 0,…

python string sorting suffix-array

asked Feb 17 '10 at 16:47

ephes

1,411
1
12
17

votes

3 answers

Constructing a Good Suffix Table - Understanding an example

I'm really trying to understand an example on how to construct a good suffix table for a given pattern. The problem is, I'm unable to wrap my head around it. I've looked at numerous examples, but do not know where the numbers come from. So here…

suffix-array boyer-moore

asked Dec 11 '14 at 17:13

northerner

votes

9 answers

Efficient String/Pattern Matching in C++ (suffixarray, trie, suffixtree?)

I'm looking for an efficient data structure to do String/Pattern Matching on an really huge set of strings. I've found out about tries, suffix-trees and suffix-arrays. But I couldn't find an ready-to-use implementation in C/C++ so far (and…

c++ pattern-matching trie suffix-tree suffix-array

asked Nov 13 '12 at 16:41

Constantin

7,926
12
71
112

votes

1 answer

Using suffix array algorithm for Burrows Wheeler transform

I've sucessfully implemented a BWT stage (using regular string sorting) for a compression testbed I'm writing. I can apply the BWT and then inverse BWT transform and the output matches the input. Now I wanted to speed up creation of the BW index…

c++ algorithm suffix-array burrows-wheeler-transform

asked Jan 06 '16 at 18:10

Bim

2 3

…

9 10 Next

Questions tagged [suffix-array]

Formal definitions

Implementation

The suffix-array tag

The `suffix-array` tag