In-Place Bijective Burrows-Wheeler Transforms

One of the most well-known variants of the Burrows-Wheeler transform (BWT) [Burrows and Wheeler, 1994] is the bijective BWT (BBWT) [Gil and Scott, arXiv 2012], which applies the extended BWT (EBWT) [Mantaci et al., TCS 2007] to the multiset of Lyndon factors of a given text. Since the EBWT is invertible, the BBWT is a bijective transform in the sense that the inverse image of the EBWT restores this multiset of Lyndon factors such that the original text can be obtained by sorting these factors in non-increasing order. In this paper, we present algorithms constructing or inverting the BBWT in-place using quadratic time. We also present conversions from the BBWT to the BWT, or vice versa, either (a) in-place using quadratic time, or (b) in the run-length compressed setting using O ( n lg r/ lg lg r ) time with O ( r lg n ) bits of words, where r is the sum of character runs in the BWT and the BBWT. 2012 ACM Subject Classiﬁcation


Introduction
The Burrows-Wheeler transform (BWT) [6] is one of the most favored options both for (a) compressing and (b) indexing data sets.On the one hand, compression programs like bzip2 apply the BWT to achieve high compression rates.For that, they leverage the effect that the BWT built on repetitive data tends to have long character runs, which can be compressed by run-length compression, i.e., representing a substring of a's by the tuple (a, ).On the other hand, self-indexing data structures like the FM-index [11] enhance the BWT to a full-text self-index.A combined approach of both compression and indexing is the run-length compressed FM-index [21], representing a BWT with r BWT character runs, i.e., maximal repetitions of a character, run-length compressed in O(r BWT lg n) bits.This representation can be computed directly in run-length compressed space thanks to Policriti and Prezza [30].The BWT and its run-length compressed representation have been intensively studied during the past decades (e.g., [12,1,14] and the references therein).Contrary to that, a variant, called the bijective BWT (BBWT) [16], is far from being well-studied despite its mathematically appealing characteristics 1 .As a matter of fact, we are only aware of one index data structure based on the BBWT [3] and of two non-trivial construction algorithms [5,2] of the (uncompressed) BBWT, both with the need of additional data structures.
In this article, we shed more light on the connection between the BWT and the BBWT by quadratic time in-place conversion algorithms in Sect. 5 constructing the BWT from the BBWT, or vice versa.We can also perform these conversions in the run-length compressed setting in O(n lg r/ lg lg r) time with space linear to the number of the character runs (cf.Sect. 4 and Thm.3), where r is the sum of character runs in the BWT and the BBWT.

Related Work
Given a text T of length n, the BWT of T is the string obtained by assigning BWT[i] to the character preceding the i-th lexicographically smallest suffix of T (or the last character of T if this suffix is the text itself).By this definition, we can construct the BWT with any suffix array [22] construction algorithm.However, storing the suffix array inherently needs n lg n bits of space.Crochemore et al. [9] tackled this space problem with an in-place algorithm constructing the BWT in O(n 2 ) online on the reversed text by simulating queries on a dynamic wavelet tree [17] that would be built on the (growing) BWT.They also gave an algorithm for restoring the text in-place in O(n 2+ ) time.
In the run-length compressed setting, Policriti and Prezza [30] can compute the run-length compressed BWT having r BWT character runs in O(n lg r BWT ) time while using O(r BWT lg n) bits of space.They additionally presented an adaption of the wavelet tree on run-length compressed texts, yielding a representation using O(r BWT lg n) bits of space with O(lg r BWT ) query and update time.Finally, practical improvements of the run-length compressed BWT construction were considered by Ohno et al. [29].
The BBWT is the string obtained by assigning BBWT[i] to the last character of the i-th smallest string in the list of all conjugates of the factors of the Lyndon factorization sorted with respect to the ≺ ω order [23,Def. 4].Bannai et al. [2] recently revealed a connection between the bijective BWT and suffix sorting by presenting an O(n) time BBWT construction algorithm based on SAIS [28].With dynamic data structures like a dynamic wavelet tree [27], Bonomo et al. [5] could devise an algorithm computing the BBWT in O(n lg n/ lg lg n) time.With nearly the same techniques, Mantaci et al. [24] presented an algorithm computing the BWT (and simultaneously the suffix array if needed) from the Lyndon factorization.All these construction algorithms need however data structures taking O(n lg n) bits of space.However, the latter two (i.e., [5] and [24]) can work in-place by simulating the LF mapping (cf.Sects.3.4 and 3.5), which we focus on in Sect.5.1.

Preliminaries
Our computational model is the word RAM model with word size Ω(lg n).Accessing a word costs O(1) time.An algorithm is called in-place if it uses, besides a rewriteable input, only O(lg n) bits of working space.We write [b(I) . .e(I)] = I for an interval I of natural numbers.

Strings
Let Σ denote an integer alphabet of size σ with σ = n O (1) .We call an element When T is represented by the concatenation of X, Y, Z ∈ Σ * , i.e., T = XYZ, then X, Y and Z are called a prefix, substring and suffix of T , respectively; the prefix X, substring Y , . j] denote the substring of T that begins at position i and ends at position j in T .If i > j, then T [i . .j] is the empty string.In particular, the suffix starting at position j of T is called the j-th suffix of T , and denoted with T [j . .].An occurrence of a substring S in T is treated as a sub-interval of [1 . .|T |] such that S = T [b(S) . .e(S)].The longest common prefix (LCP) of two strings S and T is the longest string that is a prefix of both S and T .

Orders on Strings.
We denote the lexicographic order with ≺ lex .Given two strings S and T , S ≺ lex T if S is a prefix of T or there exists an integer with 1 Next we define the ≺ ω order of strings, which is based on the lexicographic order of infinite strings: We write S ≺ ω T if the infinite concatenation Rank and Select Queries.Given a string T ∈ Σ * , a character c ∈ Σ, and an integer j, the rank query T.rank c (j) counts the occurrences of c in T [1 . .j], and the select query T.select c (j) gives the position of the j-th c in T .We stipulate that rank c (0) = select c (0) = 0.A wavelet tree is a data structure supporting rank and select queries.

Lyndon Words
Given a string . We say that T and all of its conjugates belong to the conjugate class conj(T ) := {conj 0 (T ), . . ., conj n−1 (T )}.If a conjugate class contains exactly one conjugate that is lexicographically smaller than all other conjugates, then this conjugate is called a Figure 1 All three BWT variants studied in this paper applied on our running example T = bacabbabb.Left: BBWT built on the last characters of the conjugates of all Lyndon words sorted in the ≺ω order.Middle and Right: BWT • and BWT built on the lexicographically sorted conjugates of T and of T $, respectively.To ease understanding, each character is marked with its position in T in subscript.Reading these positions in F of BBWT and in F of BWT gives a circular suffix array (there are multiple possibilities with T3 = T4 = abb) and the suffix array (the position of $ is uniquely defined as |T $| = 10).
Lyndon word [20].Equivalently, a string T is said to be a Proof.The algorithm of Duval uses three variables i, j, and k (cf.Algo. 1 in the appendix) pointing to text positions.k is the ending position of the previously computed Lyndon factor (or zero at the beginning).On each step, j ∈ [k + 2 . .n] is incremented by one, while i is either incremented by one or reset to k + 1, as long as is either a Lyndon factor or a repetition of Lyndon factors, each of length j − i.In total, we visit at most 2n characters by incrementing the text positions i, j, and k.
For what follows, we fix a string T [1 . .n] over an alphabet Σ with size σ.We use the string T := bacabbabb as our running example.Its Lyndon factors are T 1 = b, T 2 = ac, T 3 = abb, and T 4 = abb.

Burrows-Wheeler Transforms
We denote the bijective BWT of T by BBWT, where BBWT[i] is the last character of the i-th string in the list storing the conjugates of all Lyndon factors T 1 , . . ., T t of T sorted with respect to the ≺ ω order.A property of BBWT used in this paper as a starting point for an inversion algorithm is the following:

Lemma 2 ([5, Lemma 15]). BBWT[1] = T [n].
Proof.There is no conjugate of a Lyndon factor that is smaller than the smallest Lyndon factor Therefore, T t is the smallest string among all conjugates of all Lyndon factors.Hence, BBWT [1] is the last character of T t , which is T The BWT of T , called in the following BWT, is the BBWT of $T for a delimiter $ ∈ Σ smaller than all other characters in T (cf.[15,Lemma 12] since $T is a Lyndon word).Originally, the BWT is defined by reading the last characters of all cyclic rotations of T (without $) sorted lexicographically [6].Here, we call the resulting string BWT • .BWT • is equivalent to BWT if T contains the aforementioned unique delimiter $.We further write BWT P (and analogously BBWT P or BWT • P ) to denote the BWT of P for a string P .Since BWT (and analogously BBWT or BWT • ) is a permutation of T , it is natural to identify each entry of BWT with a text position: By construction where T [j + 1 . .] is the i-th lexicographically smallest suffix, i.e., SA[i] = j + 1, where SA is the suffix array of T .A similar relation is given between BBWT and the circular suffix array [19,2], which is uniquely defined up to positions of equal Lyndon factors.Figure 1 gives an example for all three variants.In what follows, we review means to simulate a linear traversal of the text in forward or backward manner by BWT, and then translate this result to BBWT.

Backward and Forward Steps
Having the location of T [i] in BWT, we can compute T [i + 1] (i.e., T [1] for i = 1) and T [i − 1] (i.e., T [n] for i = 0) by rank and select queries.To move from T [i] to T [i + 1], which we call a forward step, we can use the FL mapping: where F[i] is the i-th lexicographically smallest character in BWT.To move from , we can use the backward step of the FM-index [11], which is also called LF mapping, and is defined as follows: where C[c] is the number of occurrences of those characters in BWT that are smaller than c (for each character c ∈ [1 . .σ]).We observe from the second equation of ( 2) that there is no need for F when having C.This is important, as we can compute C[i] in O(n) time only having BWT available.Hence, we can compute LF[i] in O(n) time in-place.However, the same trick does not work with If we allow more space, it is still advantageous to favor storing C instead of F if σ = o(n) because storing F and C in their plain forms take n lg σ bits and σ lg n bits, respectively.To compute FL[i], we can also compute FL without F by endowing C with a predecessor data structure (which we do in Sect.4.3).
Finally, we also need LF and FL on BBWT for our conversion algorithms.We can define LF and FL similarly for BBWT with the following peculiarity:

Steps in the Bijective BWT
The major difference to the BWT is that the LF mapping of the BBWT can contain multiple cycles, meaning that LF (or FL) recursively applied to a BBWT position would result in searching circular (more precisely, the search stays within the same Lyndon factor).This is because BBWT is the extended BWT [23,Thm. 20 and Remark 12] applied to the multiset of Lyndon factors {T 1 , . . ., T t }.This fact was exploited for circular pattern matching [19], but is not of interest here.[1] in BBWT) with one backward step.We use this property with Lemma 2 in the following sections to read the Lyndon factors from T individually in the order T t , . . ., T 1 .

Run-Length Compressed Conversions
We now consider BWT and BBWT represented as run-length compressed strings taking O(r BWT lg n) and O(r BBWT lg n) bits of space, where r BWT and r BBWT are the number of character runs in BWT and BBWT, respectively.For r := max(r BWT , r BBWT ), the goal of this section is the following: Theorem 3. We can convert RLBBWT to RLBWT in O(n lg r/ lg lg r) time using O(r lg n) bits as working space, or vice versa.
To prove this theorem, we need a data structure that works in the run-length compressed space while supporting rank and select queries as well as updates more efficiently than the O(n) time in-place approach described in Sects.3.4 and 3.5:

Run-length Compressed Wavelet Trees
Given a run-length compressed string S of uncompressed length n with r character runs, there is an O(r lg n) bits representation of S that supports access, rank, select, insertions, and deletions in O(lg r) time [30, Lemma 1].It consists of (1) a dynamic wavelet tree maintaining the starting characters of each character run and (2) a dynamic Fenwick tree maintaining the lengths of the runs.It can be accelerated to O(lg r/ lg lg r) time by using the following representations: 1.The dynamic wavelet tree of Navarro and Nekrich [27] on a text of length r uses O(r lg r) bits, and supports both updates and queries in O(lg r/ lg lg r) time.

The dynamic Fenwick tree of Bille et al [4, Thm. 2] on r (lg n)-bit numbers uses O(r lg n)
bits, and supports both updates and queries in constant time if updates are restricted to be in-/decremental.The obtained time complexity of this data structure directly improves the construction of RLBWT: Corollary 4 ([30, Thm.2]).We can construct the RLBWT in O(r BWT lg n) bits of space online on the reversed text in O(n lg r BWT / lg lg r BWT ) time.
In the run-length compressed wavelet tree representation, RLBWT and RLBBWT support an update operation and a backward step in O(lg r/ lg lg r) time with r := max(r BWT , r BBWT ).This helps us to devise the following two conversions:

From RLBBWT to RLBWT
We aim for directly outputting the characters of T in reversed order since we can then use the algorithm of Cor. 4 building RLBWT online on the reversed text.We start with the first entry of BBWT (corresponding to the last Lyndon factor T t , i.e., storing according to Lemma 2) and do a backward step until we come back at this first entry (i.e., we have visited all characters of T t ).During that search, we copy the read characters to RLBWT and mark in an array R of length r BBWT at entry i how often we visited the i-th character run of RLBBWT.Finally, we remove the read cycle of RLBBWT by decreasing the run lengths of RLBBWT by the numbers stored in R. By doing so, we remove the last Lyndon factor T t from RLBBWT and consequently know that the currently first entry of BBWT must correspond to T t−1 .This means that we can apply the algorithm recursively on the remaining RLBBWT to extract and delete the Lyndon factors in reversed order while building RLBWT in the meantime.By removing T t , BBWT is still a valid BBWT since BBWT becomes the BBWT of T := T 1 • • • T t−1 whose Lyndon factors are the same as of T (but without T t ).Note that it is also possible to build RLBWT in forward order, i.e., computing RLBWT T1•••Tx for increasing x by applying the algorithm of Mantaci et al. [24, Fig. 1] while omitting the suffix array construction.

From RLBWT to RLBBWT
To build BBWT, we need to be aware of the Lyndon factors of T , which we compute with Lemma 1 by simulating a forward scan on T with FL on BWT.To this end, we store the entries of the C array in a Fusion tree [13] using O(σ lg n) bits and supporting predecessor search in O(lg σ/ lg lg σ) = O(lg r/ lg lg r) time. 2 This time complexity also covers a forward step in RLBWT by simulating F with the Fusion tree on C. Hence, this fusion tree allows us to apply Lemma 1 computing the Lyndon factorization of T with a multiplicative O(lg r/ lg lg r) time penalty since this algorithm only needs to perform forward traversals.The starting point of such a traversal is the position i with BWT[i] = $ because FL[i] returns the first character of T .Whenever we detect a Lyndon factor T x (starting with x = 1), we copy this factor to our dynamic RLBBWT.For that, we always maintain the first and the last position of T x in memory.Having the last position of T x , we perform backward steps on RLBWT until returning at the first position of T x to read the characters of T x in reversed order.Then we continue with the algorithm of Lemma 1 at the position after T x (for recursing on T x+1 ).Inserting a Lyndon factor into RLBBWT works exactly as sketched by Bonomo et al. [5,Thm. 17] or in Algo. 2 in the appendix (we review this algorithm in detail in Sect.5.1).3], which allows us to also convert BBWT to BWT with the BWT construction algorithm of the same paper [9, Fig. 2].Finally, we show a conversion from BWT to BBWT in Sect.5.4.An overview is given in Table 1.

Constructing BWT • and BBWT
We can compute BWT • and BBWT from T with the algorithm of Bonomo et al. [5] computing the extended BWT [23].The extended BWT is the BWT defined on a set of primitive strings.
As stated in Sect.3.5, the extended BWT coincides with BBWT if this set of primitive strings is the set of Lyndon factors of T [5, Thm.14].We briefly describe the algorithm of Bonomo et al. [5] for computing the BBWT (cf.Fig. 2 Lemma 12].That is because sorting the suffixes of T is equivalent to sorting the conjugates of T (if T is a Lyndon word, then its Lyndon factorization consists only of T itself).
It is easy to generalize this to work for a general string T .First, if T is primitive, then we compute its so-called Lyndon conjugate, i.e., a conjugate of T that is a Lyndon word.Proof.We use Lemma 1 to detect the last Lyndon factor T t of the Lyndon factorization T 1 • • • T t of T with O(lg n) bits of working space.According to Lemma 5, T t T 1 is a Lyndon word since T t ≺ lex T 1 , and so is T t T 1 • • • T t−1 a Lyndon word by a recursive argument.Hence, we have found T 's Lyndon conjugate.
Let conj j (T ) be the Lyndon conjugate of T for j ∈ [0 . .n − 1].Since BWT • is identical to BBWT conj j (T ) , we are done by running the algorithm of Bonomo et al. [5] on conj j (T ).Finally, if T is not primitive, then there is a primitive string P such that T = P k for an integer k ≥ 2. We can compute BWT • P with the above considerations.For obtaining BWT • , according to [25, Prop.2], we only need to make each character in BWT • P to a character run of length k, i.e., if [15,Thm. 13]).Checking whether T is primitive can be done in O(n 2 ) time by checking for each pair of positions their longest common prefix.We summarized these steps in the pseudo code of Algo. 4 in the appendix.

Inverting BWT •
To invert BWT • , we use the techniques of Crochemore et al. [9,Fig. 3] inverting BWT in-place in O(n 2 ) time.An invariant is that the BWT entry, whose FL mapping corresponds to the next character to output, is marked with a unique delimiter $.Given that BWT and recurses until $ is the last character remaining in BWT.By doing so, it restores the text in text order.

3). First Column:
We prepend the $ delimiter to the last Lyndon factor Tt by inserting $ at BBWT [2].A forward step symbolized by the dashed arrow ( ) leads us from $ to the first character of Tt.Second Column: We output BBWT [6] = Tt [1] = T [7], remove $ and update BBWT [6] ← $.The output is appended to the string shown below the dashed horizontal line ( ).We continue with a forward step to access BBWT [4] = Tt [2] = T [8], and recurse in the third column.Forth Column: Since a forward step returns $, we know that we have successfully extracted Tt = abb.
To adapt this algorithm for inverting BWT • , we additionally need a pointer p storing the first symbol of the text (since there is no unique delimiter such as $ in general).Given that p points to BWT • [i], we set i ← FL[i] and subsequently output BWT • [i].From now on, the algorithm works exactly as [9, Fig. 3] if we set BWT • [i] ← $ after outputting BWT • [i] (cf.Algo. 5 in the appendix).More involving is inverting BBWT or converting BBWT to BWT, which we tackle next.

Inverting BBWT
Similarly to Sect.4.2, we read the Lyndon factors from BBWT in the order T t , . . ., T 1 , and move each read Lyndon factor directly to a text buffer such that while reading the last Lyndon factor T x for an This allows us to recurse by reading always the last Lyndon factor T x stored in BBWT T1•••Tx .
Here, we want to apply the inversion algorithm for BWT • described in Sect.5.3.For adapting this algorithm to work with BBWT, it suffices to insert $ at BBWT [2] (cf.Fig. 3).By doing so, we add $ to the cycle of the currently last Lyndon factor T x stored in BBWT, i.e., we enlarge the Lyndon factor T x to $T x .That is because (a) BBWT [1] a forward step on the last character of T x gives $) and FL [2] gives the position in BBWT corresponding to T x [1].Moreover, inserting $ makes BBWT the BBWT of where $T x is the last Lyndon factor of T .We now use the property that allowing us to perform the inversion steps of Crochemore et al. [9,Fig. 3] on BBWT.By doing so, we can remove the entry of BBWT corresponding to conj j (T x ) for increasing j ∈ [0 . .|T x | − 1] and prepend the extracted characters to the text buffer storing T x+1 • • • T t within our working space while keeping BBWT a valid BBWT.
Instead of inverting BBWT, we can convert BBWT to BWT in-place by running the in-place BWT construction algorithm of Crochemore et al. [9,Fig. 2] on the text buffer after the extraction of each Lyndon factor.Unfortunately, this works not character-wise, but needs a Lyndon factor to be fully extracted before inserting its characters into BWT.Interestingly, for the other direction (from BWT to BBWT), we can propose a different kind of conversion that works directly on BWT without decoding it.

From BWT to BBWT on the Fly
Like in Sect.4.3, we process the Lyndon factors of T individually to compute BBWT by scanning BWT in text order to simulate Lemma 1. Suppose that we have built BWT on T $ = $ with $ being the (t + 1)-th Lyndon factor of T and suppose that we have detected the first Lyndon factor T 1 .Let f denote the last character of T 1 .3Further let i f and i $ be the position of the last character of T 1 and the last character of T , respectively, such that 4 gives an overview of the introduced setting.
Our aim is to change BWT such that a forward or backward step within the characters belonging to T 1 always results in a cycle.Informally, we want to cut T 1 out of BWT, which additionally allows us to recursively continue with the FL mapping to find the end of the next Lyndon factor T 2 . 4For that, we exchange BWT[i $ ] with BWT[i f ] (cf.Fig. 5).Then the character T [e(T 1 ) + 1] (i.e., the first character of T 2 ) becomes the next character of $ in terms of the forward step (BWT[FL[i f ]] = T [b(T 2 )]), while a backwards search on the first character of T 1 yields T 1 's last character (LF returns i $ , but now . This is sufficient as long as BWT[i] = f for every i ∈ (i f . .i $ ].Otherwise, it can happen that we change the mapping from the i-th f of F to the i-th f of BWT (or vice versa) unintentionally.In such a case, we swap some entries in BWT within the f interval of F. In detail, we conduct the exchange (BWT[i $ ] with BWT[i f ]), but continue with swapping BWT[i] and . This may not be sufficient if the characters we swap are identical (cf.Fig. 6).In such a case, we recurse on the T 1 [|T 1 | − 1] interval of F, see also Algo.7 in the appendix.
Instead of checking whether we have created a cycle after each swap, we want to compute the exact number of swaps needed for this task.For that we note that exchanging BWT[i $ ] with BWT[i f ] decrements the values of BWT.rank f (j) for every j ∈ [i f . .i $ ] by one.In Computing BBWT from BWT (cf.Sect.5.4) of our running example T = bacabbabb$.In the left column, we find the first Lyndon factor T1 = b of T by forward steps with FL.Since |T1| = 1, p = i $ .We obtain the middle column by exchanging BWT [4] with BWT[7] = $.Since there are two b's between b at BWT [4] and $ in the left column, we need to swap BWT[p] with the two elements below of it in the middle column.This gives a cycle in the right column.We can recurse since the FL mapping of $ now yields the second character of T .particular, BWT.select f changes for those f's in BWT that are between i f and i $ .Hence, the number of swaps m is the number of positions k ∈ Correctness.To see why the swaps restore the LF mapping for T 1 and the remaining part of the text T 2 • • • T t , we examine those substrings of T that we might no longer find with the LF mapping after exchanging BWT In detail, we examine each substring S j := However, for all i > p + m, FL[i] did not change.Hence, we only have to focus on the range

Open Problems
Our algorithm of Sect.5.3 converts BBWT to BWT, Lyndon factor by Lyndon factor.It would be interesting to find another conversion that works character-wise.Here, our inversion algorithm extracts a Lyndon factor in text order from BBWT, while the used BWT construction algorithm parses the text in reverse text order.Crochemore et al. [9,Sect. 4] proposed a space and time trade-off algorithm based on their in-place techniques computing or inverting BWT.We are positive that it should be possible to adapt their techniques for computing or inverting BBWT or BWT • with a trade-off parameter.
From the combinatorial perspective, we question whether the number of distinct Lyndon words of T is bounded by the runs in BBWT.If we can affirm this question, it would be possible to adapt the BBWT based index data structure [3] for RLBBWT using O(r BBWT lg n) bits of space because this solution needs a bit vector with rank and select support marking the positions in BBWT corresponding to the distinct Lyndon factors.If this number is at most the number of runs r BBWT , then we can store this bit vector entropy-compressed in O(r lg n) bits when r BBWT = o(n) since nH 0 (r) = n lg(n/(n − r)) + r lg((n − r)/r) ≤ n lg r ⇔ r lg((n − r)/r) ≤ n lg(r(n − r)/n) for r = r BBWT .
Speaking of RLBBWT, we wonder whether we can construct RLBBWT online in runlength compressed space similar to Cor. 4. With the run-length compressed wavelet tree, the algorithm of Bonomo et al. [5,Thm. 17   To construct BWT, we follow Fig. 7: The Lyndon factorization of $T consists only of $T itself.Consequently, we take all conjugates of $T (left) and sort them (right).Thanks to the $, it does not matter whether we sort by lexicographic order or ≺ω order [5, Lemma 7].The first characters and the last characters in this sorted list give F and BWT, respectively.

b 1 a 2 c 3 a 4 b 6 a 7 b 9 $ b 5 b 8 b 1 a 2 c 3 a 4 b 6 a 7 b 9 $ b 5 b 8 b 1 a 2 c 3 a 4 b 6 a 7 b 9
we can use the selection algorithm of Chan et al.[7] using BWT and O(lg n) bits as working space (the algorithm restores BWT after execution) to compute an entry of F in O(n) time.In summary, we can compute both FL[i] andLF[i] in-place in O(n) time.The algorithm of Crochemore et al. [9, Thm.2] inverting BWT in-place in O(n 2+ ) time uses the result of Munro and Raman [26] computing F[i] in O(n 1+ ) time for a constant > 0 in the comparison model.As noted by Chan et al. [7, Sect.1], the time bound for the inversion can be improved to O(n 2 ) time in the RAM model under the assumption that BWT is rewritable.

F
Instead, we follow the analysis of the so-called rewindings[3, Sect.3]: Remembering that we store the last character of all conjugates of all Lyndon factors in BBWT, we observe that the entries in BBWT representing the Lyndon factors (i.e., the last characters of the Lyndon factors) are in sorted order (starting with T t [|T t |] and ending with T 1 [|T 1 |]).That is because the lexicographic order and the ≺ ω order are the same for Lyndon words [5, Thm.8].Applying the backward step at such an entry results in a rewinding, i.e., we can move from the beginning of a Lyndon factor T x (represented by T x [|T x |] in BBWT) to the end of T x (represented by T x

Figure 2 Lemma 6 .
Figure 2Computing BBWT from our running example T = bacabbabb in four steps (visualized by four columns separated by three arrows ), cf.Sect.5.1.In each column, the characters from the top to the solid horizontal line ( ) form the currently built BBWT.The characters below that up to the dashed horizontal line ( ) are under consideration of being merged into BBWT.This dashed line is always before the beginning of the next yet unread Lyndon factor.First column: We have already computed the BBWT of T1T2 = bac, which is cba.In the following we want to add the next Lyndon factor T3 = abb to it.For that, we prepend its last character to the currently constructed BBWT.Second column: We move the last character above the dashed line to the position LF[p] + 1 with p = 1, and update p ← LF[p] + 1.We recurse in the third column, and have produced the BBWT of T1T2T3 = bacabb in the forth column.

Figure 3
Figure 3 Inverting BBWT of our running example T = bacabbabb (cf.Sect.5.3).First Column:We prepend the $ delimiter to the last Lyndon factor Tt by inserting $ at BBWT[2].A forward step symbolized by the dashed arrow ( ) leads us from $ to the first character of Tt.Second Column: We output BBWT[6] = Tt[1] = T[7], remove $ and update BBWT[6] ← $.The output is appended to the string shown below the dashed horizontal line ( ).We continue with a forward step to access BBWT[4] = Tt[2] = T[8], and recurse in the third column.Forth Column: Since a forward step returns $, we know that we have successfully extracted Tt = abb.

Figure 4
Figure 4 Setting of Sect.5.4 with focus on forming a cycle for a Lyndon factor ending with f in BWT.Left: We exchange BWT[if] with BWT[i $ ] with the aim to form a cycle.Right: To obtain this cycle we additionally need to swap BWT[p] with the elements of the dashed rectangle ( ) corresponding to the interval I having the same height as the dotted rectangle ( ) covering BWT[if + 1 . .i $ − 1].

Figure 5
Figure 5Computing BBWT from BWT (cf.Sect.5.4) of our running example T = bacabbabb$.In the left column, we find the first Lyndon factor T1 = b of T by forward steps with FL.Since |T1| = 1, p = i $ .We obtain the middle column by exchanging BWT[4] with BWT[7] = $.Since there are two b's between b at BWT[4] and $ in the left column, we need to swap BWT[p] with the two elements below of it in the middle column.This gives a cycle in the right column.We can recurse since the FL mapping of $ now yields the second character of T .
The swaps are performed within the range I starting with p + 1 and covering all positions i with LF[i] ∈ [i i . .i $ ] and F[i] = f since I covers all entries whose mapping has changed.However, if BWT[p . .] starts with a character run ofT [e(T 1 ) − 1] (or of T [b(T 1 )] if |T 1 | = 1)5 , swapping the identical characters does not change BWT, and therefore has no effect of changing LF.Instead, we search the end of this run within I to swap the first entry i below this run with the first entry of this run, and recurse on swapping entry i with entries below of it.

becomes x 1 6 Figure 6
Figure 6 Special case for computing BWT from BBWT (cf.Sect.5.4) with the different example string T $ := cedabedad$ having T1 = ced as its first Lyndon factor.Left column: We find the first Lyndon factor T1 = ced of T by forward steps with FL.Its last character is stored at BWT[2].By exchanging $ with the last character of T1 in BWT, we obtain the middle column.Middle column: The LF mapping for the third d in F becomes invalid.However, there is only a character run of T1[|T1| − 1] = e in BWT of the T1[|T1|] = d interval [7 . .8] in F starting with p = 7.So we recurse on LF[p] to find characters different from T1[|T1| − 2] = c to swap in the respective T1[|T1| − 1] = e interval [9 . .10].Right Column: We have created a cycle with the characters of the first Lyndon factor.A forward step at $ gives the first character of the next Lyndon factor.
] works in O(n lg r BBWT / lg lg r BBWT ) time with max x∈[1..t]|T x | + O(r BBWT lg n) bits of space by reading each Lyndon factor of the text individually.

Figure 7
Figure 7Constructing BBWT of T $ = bacabbabb$.The Lyndon factorization of T $ is visualized by the vertical bars.We take all conjugates of each Lyndon factor into a list, sort this list with respect to the ≺ω order.The first characters and the last characters in this list give F and BBWT, respectively.

Figure 8
Figure 8Constructing BWT of $T = $bacabbabb.Since $T is a Lyndon word, BWT $T = BBWT $T .To construct BWT, we follow Fig.7:The Lyndon factorization of $T consists only of $T itself.Consequently, we take all conjugates of $T (left) and sort them (right).Thanks to the $, it does not matter whether we sort by lexicographic order or ≺ω order [5,Lemma 7].The first characters and the last characters in this sorted list give F and BWT, respectively.
Lyndon word if and only if T ≺ lex S for every proper suffix S of T [10, Prop.1.2].The Lyndon factorization [8] of T ∈ Σ + is the factorization of T into a sequence of lexicographically non-increasing Lyndon words T 1 • • • T t , where (a) each T x ∈ Σ + is a Lyndon word for x ∈ [1 . .t], and (b) T x lex T x+1 for each x ∈ [1 . .t).Each Lyndon word T x is called a Lyndon factor.

Table 1
[26]view of in-place conversions in focus of Sect. 5 working in quadratic time.We finally present our in-place conversions that work in quadratic time by computing LF or FL in O(n) time having only stored either BWT, BBWT, or BWT • .We note that the constructions from the text also work in the comparison model, while inverting a transform or converting two different transforms have a multiplicative O(n ) time penalty as the fastest option to access F in the comparison model uses O(n 1+ ) time for a constant > 0[26].We start with the construction and inversion of BWT [9,Sects.5.1 and 5.2), where we show (a) that we can construct BWT • from the text in the same manner as Bonomo et al.[5]construct BBWT, and (b) that the latter construction works also in-place.Next, we show in Sect.5.3 how to invert BBWT with the BWT inversion algorithm of Crochemore et al.[9, Fig.
To understand why this computes BBWT, we observe that the last character of the most recently inserted Lyndon factor T x is always the first character in BBWT T1•••Tx according to Lemma 2. By recursively inserting the preceding character at the place returned by a backward step, we precisely insert this character at the position where we would expect it (another backward step from the same position p would then return the inserted character).Using only n backward steps and n insertions, this algorithm works in-place in O(n 2 ) time by simulating LF as described in Sect.3.4.Consequently, we can build BWT • if T is a Lyndon word since in this case BWT • and BBWT coincide[15, and Algo. 2 in the appendix): For each Lyndon factor T x (starting with x = 1), prepend T x [|T x |] to BBWT.To insert the remaining characters of the factor T x , let p ← 1 be the position of the currently inserted character.Then perform, for each j = |T x | − 1 down to 1, a backward step p ← LF[p] + 1, and insert T x [j] at BBWT[p] (cf.Algo. 2 in the appendix).