Exploiting ROLLO’s Constant-Time Implementations with a Single-Trace Analysis

. ROLLO was a candidate to the second round of the NIST Post-Quantum Cryptography standardization process. In the last update in April 2020, there was a key encapsulation mechanism (ROLLO-I) and a public-key encryption scheme (ROLLO-II). In this paper, we propose an attack to recover the syndrome during the decapsulation process of ROLLO-I. From this syndrome, we explain how to recover the private key. We target two constant-time implementations: the C reference implementation and a C implementation available on GitHub . By getting power measurements during the execution of the Gaussian elimination function, we are able to extract on a single trace each element of the syndrome. This attack can also be applied to the decryption process of ROLLO-II.


Introduction
Nowadays number theory based cryptography, like RSA [17] or ECDSA [11], is efficient but weak against the Shor's quantum algorithm [19].The existence of quantum algorithms pushed the National Institute of Standards and Technology (NIST) to anticipate the time when an efficient quantum computer will be able to execute these algorithms and break commonly used public-key cryptography.In late 2016, NIST started the Post-Quantum Cryptography (PQC) standardization process to get signatures and, key encapsulation mechanisms (KEM) or public-key encryption schemes (PKE), resisting to both classical and quantum attacks.Among the historical schemes, as McEliece [14] or NTRU [10], there are recent proposals based on rank metric.Errorcorrecting codes in rank metric allow to reduce some drawbacks of Hamming metric, like the key-sizes.In the second round of this standardization process, there were two proposals in rank metric, namely ROLLO [1] and RQC [15].Both were not selected for the third round due to some algebraic attacks [5,6].Nonetheless, NIST encouraged the community to study rank metric cryptosystems: "NIST believes rank-based cryptography should continue to be researched" [16].Rank-metric based cryptosystems seem to be a good alternative to cryptosystems in Hamming metric, but were not studied enough at that point regarding side-channel analysis and embedded implementations.Indeed, public-key cryptosystems are commonly used in embedded systems.Thus it is essential to identify potential leakage to improve their resistance against side-channel attacks and ensure their security in practice.Kocher introduced side-channel attacks in 1996 [12].An attacker can use information provided by a side-channel to extract secret data from a device executing a cryptographic primitive.The information leakage is exploited without having to tamper with the device.The first side-channel attack against a code-based cryptosystem was proposed in 2008 [20] and targeted McEliece cryptosystem in rank metric.It was then followed by numerous others in more than a decade of research, with timing or power consumption attacks.More recently, there were two papers combining physical attacks with algebraic properties [8,13].We do not detail more those attacks since they are out of scope.

Related work.
Two recent papers related to side-channel attacks on code-based cryptography in rank metric have been published [4,18].Both exploit timing leakage from the decoding failure rate of LRPC codes [9].In this work, we focus on constanttime implementations of schemes using LRPC codes.We target two constanttime implementations of ROLLO, and in particular the Gaussian elimination function.The first one is provided by the authors of ROLLO's proposal to NIST [1].The second one only provides an implementation of ROLLO-I for 128 bits of security [2].

Our contribution.
To the best of our knowledge, this is the first single trace attack against different versions of the constant-time Gaussian elimination for error-correcting codes in rank metric.We show that the power consumption during the decapsulation/decryption process can provide enough information to make an efficient attack on ROLLO schemes.Our attack allows us to recover various secret data such as: • the private key in both cryptosystems via the syndrome recovery, • the shared secret in ROLLO-I key encapsulation mechanism, or the encrypted message in ROLLO-II public-key encryption.
We finally present two countermeasures to make the implementations resistant to the proposed attack.Gaussian elimination is often used in coding theory to go from a dense parity-check matrix to a parity-check matrix in systematic form.With this work we want to point out that even recent implementations of this operation could be vulnerable to side-channel attacks.For instance, Gaussian elimination in constant-time is also used in Classic McEliece.Nonetheless the implementation differs from the one in ROLLO, and we have not investigated the leakage detection on Classic McEliece yet.But in case we are able to detect differences in power consumption for some values as presented in this paper, we could also apply our attack on this scheme to recover the parity-check matrix.
Organization of the paper.
In Section 2, we recall elementary notions of error-correcting codes in rank metric as well as ROLLO schemes.In Section 3, we detail attacks on both implementations: the reference one using rbc library and the proposal on GitHub.We also provide some experimental results for ROLLO-I-128.We discuss two different countermeasures in Section 4. Finally, we conclude this paper in Section 5.

Background
ROLLO's submission is based on ideal Low-Rank Parity-Check (LRPC) codes.The latter were introduced in 2013 [9].In this section, we first give some details on ideal LRPC codes, then recall the ROLLO proposal to NIST PQC standardization process.

Rank metric codes
In the following sections, we denote by q a power of a prime number, and let m, n, and, k be positive integers such that n > k.
We also consider the isomorphism between the vector space F n q m and the extension field F q m [Z]/(P n ) given by x i Z i with P n an irreducible polynomial of degree n and (P n ) the ideal of F q m [Z] generated by P n .Note that the vector space F q m is isomorphic to F q [z]/(P m ), with P m an irreducible polynomial of degree m over F q .
A linear code C over F q m of length n and dimension k is a subspace of F n q m .It is denoted by [n, k] q m , and can be represented by a parity-check matrix each of its coordinate x i , for 1 ≤ i ≤ n, can be associated to a vector (x i,1 , . . ., x i,m ).Thus an element x ∈ F n q m can also be represented by a matrix as follows: For an element x ∈ F n q m , the syndrome of x is defined as the vector s = H.x T .Considering the rank metric, the distance between two vectors x and y in F n q m is defined by The support of a vector x = (x 1 , . . ., x n ) ∈ F n q m is defined as the subset of F q m spanned over F q .Namely, the support of x is given by Supp(x) = x 1 , . . ., x n Fq .W.l.o.g., the support of (x, y) is Supp(x, y) = x 1 , . . ., x n , y 1 , . . ., y n Fq .The ideal LRPC codes base their structure on ideal codes.
Given a polynomial P n ∈ F q [Z] of degree n and a vector v ∈ F n q m , an ideal matrix generated by v is a n × n matrix defined by ∈ F n q m , is an ideal code if a generator matrix in systematic form is of the form In [1], the authors restrain the definition of ideal LRPC (Low-Rank Parity Check) codes to (2, 1)-ideal LRPC codes that they used for all variants of ROLLO.
Let F be a F q -subspace of F q m such that dim(F ) = d.Let (h 1 , h 2 ) be a pair of two vectors in F n q m , such that Supp(h 1 , h 2 ) = F , and P n ∈ F q [Z] be a polynomial of degree n.A [2n, n] q m -code C is an ideal LRPC code if it has a parity-check matrix of the form

ROLLO
ROLLO is a second round submission to the post-quantum standardization process launched by NIST in 2016.Since the last update in April 2020, it is composed of two cryptosystems: ROLLO-I, a Key-Encapsulation Mechanism (KEM), and ROLLO-II, a Public-Key Encryption (PKE).Both are described in Figure 1.We use the following notations: q m denotes the operation of selecting randomly k vectors from the vector space F q m , then A ∈ F k q m .• (u, v) $ ← − l A denotes the operation of selecting randomly 2n linear combinations from the element A, then u, v ∈ F n q m and Supp(u, v) = A. • RSR denotes the Rank Support Recovery algorithm given in the specification of ROLLO [1] to decode LRPC codes.
We unify tables of parameters from ROLLO's specification into Table 1.For the three security levels, q = 2.
Recover the support of the error E = RSR(F, s, r) In the following, we will focus on the vulnerabilities of the implementations of Gaussian elimination process.The latter is used several times in ROLLO cryptosystems, namely to compute: • the support S of the syndrome s • the support of the error (e 1 , e 2 ) letting us recover the shared secret in the case of ROLLO-I or encrypt/decrypt a message in the case of ROLLO-II ; • the intersections of two vector spaces during the decoding of the syndrome (RSR).These intersections determine the support E of the error: with F = f 1 , . . ., f d the support of the private key.
Thus, the leakage coming from implementations of Gaussian elimination can allow a side-channel attacker to recover all the secret data.In the next section, we explain the attack on the syndrome.This analysis can be performed to recover the other mentioned data.
3 Side-channel attack on Gaussian elimination in constant-time In the RSR algorithm [1], we first compute the support of the syndrome.
For that, the Gaussian elimination is applied to the syndrome matrix to calculate its support.We know that the syndrome is first computed as: Therefore, with the knowledge of the syndrome s and the ciphertext c, we can compute x, a part of the private key as: Knowing x can lead to a full recovery of the private key.First, we can get the second part of the private key y by computing Then, the support of y and x gives the last part of the private key F .
The Gaussian elimination in constant-time requires to process each row in each column of the syndrome matrix.Thus, an attacker could be able to recover all values in this matrix.In case of a non constant-time Gaussian elimination, it is possible to treat only the rows under the pivot row.Therefore, the values in all rows above the pivot row remain unknown to the attacker.Consequently, constant-time provides an advantage to a side-channel attacker.Secondly, the constant-time eases the detection of a pattern corresponding to the targeted operation inside the power trace.Once the attacker found the exact location of this pattern, it becomes straightforward to find the locations for each other iteration.We analyzed two constant-time implementations of Gaussian elimination and discovered two possible leakages through power consumption.The first one has been provided as Additionnal Implementations in April 2020 for the second round of NIST PQC standardization process, and is available on the ROLLO candidate webpage [1].We refer to it as the reference implementation.It uses the rbc library [3], which provides different functions to implement schemes Side-Channel Attack on ROLLO PQC scheme using rank metric codes.The second implementation has been published on GitHub [2].We refer to it as the GitHub implementation.

Notations.
We denote by ⊗ the multiplication between a scalar and a row of a matrix and by ⊕ the bitwise XOR between two bits or two rows of a matrix.The bitwise AND is represented by ∧ and the bitwise NOT by ¬.The term mask does not refer to a boolean masking but to a variable giving the additions on rows according to values obtained from coefficients of the processed column.

Information leakage of the reference implementation
The reference implementation is based on Algorithm 1, which was first introduced in [7].
Algorithm 1 Gaussian elimination in constant time for i = 0, . . ., n − 1 do if i > pivot row then for i = 0, . . ., n − 1 do 14: if dimension < n then 18: end for 24: end for 25: dimension = dimension + s pivot row,i The input matrix is composed of n rows and m columns.The algorithm outputs the matrix in systematic form and its rank.The first inner for loop (line 4) fixes the ones in the diagonal (corresponding to the pivots) and the second inner for loop (line 13) removes the ones in the pivot column.In both inner for loops in Algorithm 1, mask ∈ F 2 is computed and multiplied with specific rows of the syndrome matrix.However, the multiplication of a 32-bit word (u 0 , . . ., u 31 ) 2 with zero or one provides information leakage in the power traces.This allows us to recover all the mask values computed during the process, then, the initial syndrome matrix.Our attack consists in recovering the syndrome matrix where s i,j ∈ F 2 for (i, j) ∈ 0, n − 1 × 0, m − 1 .We denote by S j the matrix obtained after the treatment of the j-th column of S and, by S[k] the k-th column of the matrix S. The recovered mask values from the two inner for loops lead to a system of linear equations.This system is obtained from two steps described below.
After the first inner for loop in Algorithm 1: we recover the mask values s pivot row,j ⊕ s i,j .If mask = 0, then the pivot row is unchanged.Otherwise, the i-th row is added to the pivot row.Then, the first loop provides the indices of rows XORed to the pivot row.We define σ σ σ j = (σ 0,j , σ 1,j , . . ., σ n−1,j ), where σ i,j = 0 if mask = 0 1 if mask = 1 , the vector containing all mask values recovered after the j-th iteration.We also define the matrix k-th column Side-Channel Attack on ROLLO PQC scheme involved in the computation of the system of linear equations.For instance, considering the pivot row of index 0.After the first inner for loop, the syndrome matrix given in Equation 1 is under the form In other words, we can compute it as where I n−1 denotes the identity matrix of size n − 1 and 0 a column of n − 1 zeros.
We notice in lines 7 − 8 in Algorithm 1 that only rows with index greater than the pivot row index are added to the pivot row.Thus, after the treatment of the column j, we define σ i,j = 0 for i ≤ pivot row.
After the second inner for loop in Algorithm 1: the recovered mask values correspond to the coefficients s i,j of the matrix obtained after the first inner for loop.We denote by σ σ σ j = (σ 0,j , . . ., σ j−1,j , * , σ j+1,j , . . ., σ n−1,j ) the vector composed of mask values.The item * represents the pivot that is not processed in the second loop.For the attack, * is replaced by one.On one hand, during the treatment of the j-th column, σ σ σ j completes the system of linear equations.Assuming we want to recover the column 0, we use a linear solver on the system On the other hand, the vector σ σ σ j allows us to recover all the operations performed on rows.These operations are taken into account in solving the system of linear equations of the (j + 1)-th column.For this, we define the matrix k-th column For example, for the treatment of column 1 we consider the matrix More generally, during the treatment of the column j, for j ≥ 1, we consider In case there is no pivot in a column, all the mask values are equal to zero, thus Finally, to recover the column j, we solve the system of linear equations
In [2], the authors introduced a row reduction in constant-time given in Algorithm 2, that can be seen as a generalization of the one presented in Algorithm 1.
At the end of Algorithm 2, we obtain a matrix under the row echelon form.In order to ensure this, three masks are first computed according to coefficients and pivot processed.Each mask is equal to 1 or 0. The three masks influence the operations on rows (lines 19-20 in Algorithm 2) as presented in Figure 2. We notice that two paths (in red bold) lead to bitwise XOR on rows.First, when mask1 = mask2 = mask3 = 1, the pivot coefficient is fixed to one.This happens at most once per loop over j.Then, when mask2 = mask3 = 1 independently of mask1, the other ones in the processed column j are removed.Side-Channel Attack on ROLLO PQC scheme

4:
if s pivot row,j == 0 then if s i,j == 1 then if i ≥ pivot row then 20: end for

24:
end if 25: end for mask1 mask2 mask3 Fig. 2: Operations on matrix rows according to mask values.In red, paths leading to XOR on rows with s p the pivot row and s i the processed row In Algorithm 2, we observe two sources of leakage.The first one consists of the computation of mask1, mask2 and mask3.These masks are set in an equivalent way, algorithmically, to secret-dependent branches.However, they are determined before an iterative conditional branching, namely in a weighted sum in the GitHub implementation.We exploit a leakage in the computation of this weighted sum to recover theirs values.Listing 1 details the implementation of mask2 computation in the GitHub version.We notice that if the processed coefficient, defined as "bit" in line 6, is equal to one, all bits of mask2 are set to one, otherwise all bits are set to zero (lines 7-8).The same kind of operations are observed for mask1 and mask3.However, the leakage from flipping all the bits to 1 or to 0 differs.We deduce that it is possible to recover the masks values.
The second source of leakage comes from the bitwise AND and XOR applied on the syndrome matrix rows.Indeed, in lines 19-20 in Algorithm 2, the rows are XORed with either zero or non-zero row according to the masks values.The second source of leakage has not been exploited because it is equivalent to what we observe with the masks' recovery.However, it is always a good point of interest for side-channel attacks.
• If the vector σ σ σ mask1,j contains zeros and ones, the position of the last one is the index of the added row to the pivot row in the column j.
We determine the system of linear equations as previously with two matrices depending on respectively of mask1 and mask2 ∧ mask3: The vector σ σ σ mask2,j depends on the coefficients processed in the column j.Therefore, σ σ σ mask2,0 gives us the first column as there is no pre-processing on rows.After the first iteration, we have to consider XORs performed on rows of the matrix during the treatment of the column j − 1.
For example, after the treatment of the column 0, the positions of the executed XORs are given in the resulting matrix J 0 × J 0 .Thus, for the column 1, we use a linear solver on the system More generally, to recover the column j ≥ 1, we have to solve the system of linear equations   k=j−1,..,0

Experimental results of our power consumption analysis
In this section, we demonstrate the practicability of the attack on an ARM SecurCore SC300 32-bit processor (equivalent to CORTEX-M3).We implemented ROLLO-I-128 in C. The first implementation corresponds to the reference one and the second to the GitHub version [2] ROLLO-I-128 traces are captured with a Lecroy SDA 725Zi-A oscilloscope with a bandwidth of 2.5 GHz.We put a trigger right before the execution of the Gaussian elimination.The measurements for the reference implementation are given in Figure 3.The power trace of the first inner for loop (line 4 -Algorithm 1) is given in Figure 3a and the power trace of the second inner for loop (line 13-Algorithm 1) is given in Figure 3b.We can observe the difference of power consumption when 32-bit words are multiplied either by one or by zero.
Fig. 4: Measurement for the GitHub implementation -trace of the treatment of one column in ROLLO-I-128

Experiments with a Cortex-M4 and comparison
In this section, we show that the attack is also applicable on an ARM Cortex-M4.For the experiments we used the ROLLO-I-128 implementation provided in the mupq github on a STM32F4 ChipWhisperer microcontroller available at https://github.com/mupq/mupq/tree/Round2/cryptokem.The traces are captured with a RTO2000 oscilloscope with bandwidth 3GHz.We put a trigger right before the execution of the Gaussian elimination.
Figure 5 provides measurements obtained with a Cortex-M4.Similarly to Figure 3 with a Cortex-M3, the traces are annotated with rectangles and colors: green for a mask at 0 and red for a mask at 1.We notice that in Figure 3a the difference between a mask at 0 and a mask at 1 is more pronounced than in Figure 5a.In fact, in the latter, the difference of power consumption between both masks is smaller and requires looking carefully at the end of the pattern to distinguish them.For Figure 3b and Figure 5b, the patterns for a mask at 0 and a mask at 1 are similar.However, we notice that the decreasing power in the pattern of a mask at 0 is more accentuated in Figure 5b.

Countermeasures
In this section, we propose two solutions to protect the future implementations against our attack.It is important to emphasize that the implementations with the countermeasures remain in constant-time.
First countermeasure for the reference implementation.
The first countermeasure consists in reducing the differentiation between a multiplication of a word by zero or by one.For this, we mask the coefficients processed.In the first inner for loop, we split the pivot row into two parts.Thus, for each iteration, we compute The same operations are performed in the second inner for loop by replacing the pivot row by the processed row s i .With this countermeasure, whether the mask is zero or one, we always perform the same operations, namely two bitwise ANDs between non-zero and zero words.Thus, we are not able to distinguish different patterns when mask equals 0 or 1.We applied the same set up as in Section 3.3 to illustrate this in Figure 6.The second countermeasure is based on shuffling.The treatment of each column is performed randomly by using an algorithm generating a random permutation of a finite set, such as the Fisher-Yates method (given in Appendix B).The choice is left to the developer under condition of a good implementation.
Given a list of n elements to shuffle, the Fisher-Yates method starts with a random function that generates a random number j such that 1 ≤ j ≤ n − k where k is the number of elements already processed.Then, the elements are processed in decreasing order.Namely, in the first iteration, the last element of the list is swapped with the j-th element and goes on until the first element is reached.
For the reference implementation, a list containing the coefficients indexes is randomized before the two inner for loops.Then, at each iteration, the pivot row is chosen randomly and the randomized list gives the proceeding order for the other coefficients in the column.This countermeasure is presented in Appendix A (Algorithm 3).The indexes are shuffled before the two inner for loops, then there is no correlation between the masks of the first for loop (line 4 -Algorithm 1) and the masks of the second for loop (line 13 -Algorithm 1).
For the GitHub implementation, a similar countermeasure is performed.presented in Appendix A (Algorithm 4).In this case, for each column (in the main for loop) the pivot row is chosen randomly and the indexes are shuffled using Fisher-Yates method.
With the randomization countermeasure, an attacker can distinguish patterns related to the masks values for both implementations, but not determine the order of elements.Moreover, a brute force attack is not achievable.Indeed, an adversary has n! possibilities for each column, which implies a total of (n!) m possibilities to recover the whole syndrome matrix.For instance, with ROLLO-I-128 parameters the complexity is approximately 2 27731 .Thus, only the number of zeros and ones on the matrix will be known.We provide in Table 2 the performances' analysis for the SC300 processor of the impact of our countermeasures.This impact depends on the board and the used random number generator.We counted the cycles by using

Conclusion
We show in this paper that constant-time implementation of Gaussian elimination provided in [1] is sensitive to power consumption attacks.We exploit the weakness introduced by the variable mask to avoid previous timing attacks.This information leakage allows us to make the first attack by power consumption on the last implementation version given by the authors of ROLLO.We can also apply our side-channel attack on another implementation of ROLLO-I-128 [2].These attacks can lead to a full key-recovery using one single trace.
To secure the implementations, we propose two different countermeasures.The first one can be applied to [1] by hiding the values of mask.The second countermeasure can be applied to both implementations.The idea is to treat each row in a column of the matrix randomly.It adds randomness which makes our attack not exploitable in practice anymore.We base our work on traces got from Cortex-M3 and Cortex-M4 microcontrollers.The constant-time Gaussian elimination function is in the rbc library library.This library is also used in the implementation of the RQC scheme.Even though the Gaussian elimination in constant time is not used in the RQC implementation, the entire library should be analyzed to find possible leakage.In particular, we want to analyze the Karatsuba function used in both ROLLO implementation and the polynomial multiplication for computation over ideal codes in RQC.Another perspective could be to analyze the various implementations of the Gaussian elimination in the third round candidates to the NIST PQC standardization process.
Appendix B Fisher-Yates Algorithm exchange L i and L j 4: end for Appendix C Toy example for the attack for the reference implementation Let us take a small example, with q = 2, m = 5 and n = 7, to illustrate the information leakage that we found.

( a )
Full trace and a zoom of the first inner loop (b) Full trace and a zoom of the second inner loop

Fig. 5 :
Fig. 5: Measurement for the processing, on the Cortex M4, of one column in the Gaussian elimination from the reference implementation

Fig. 6 :
Fig. 6: First for loop trace of Gaussian elimination with masking countermeasure

Table 1 :
The name of each variant gives the targeted classical security level, e.g.ROLLO-I-128 is a classical 128-bit security level.The parameters d and r correspond respectively to the rank of the private key and the rank of the errors.The parameters n and m can respectively be obtained with the degrees of P n and P m .ROLLO's parameters for each security level IAR Embedded Workbench IDE for ARM compiler C/C++ with high-speed optimization level.This tool is available at https://www.iar.com/knowledge/learn/debugging/how-to-measure-execution-time-with-cyclecounter/.

Table 2 :
Impact factor of Gaussian elimination with and without countermeasures for ARM securCore SC300 processor