Diffusive Dynamics of Bacterial Proteome as a Proxy of Cell Death

Temperature variations have a big impact on bacterial metabolism and death, yet an exhaustive molecular picture of these processes is still missing. For instance, whether thermal death is determined by the deterioration of the whole or a specific part of the proteome is hotly debated. Here, by monitoring the proteome dynamics of E. coli, we clearly show that only a minor fraction of the proteome unfolds at the cell death. First, we prove that the dynamical state of the E. coli proteome is an excellent proxy for temperature-dependent bacterial metabolism and death. The proteome diffusive dynamics peaks at about the bacterial optimal growth temperature, then a dramatic dynamical slowdown is observed that starts just below the cell’s death temperature. Next, we show that this slowdown is caused by the unfolding of just a small fraction of proteins that establish an entangling interprotein network, dominated by hydrophobic interactions, across the cytoplasm. Finally, the deduced progress of the proteome unfolding and its diffusive dynamics are both key to correctly reproduce the E. coli growth rate.

Model. The fully reduced experimental scattering function, S exp (q, E, T ), can be obtained by the convolution of the theoretical scattering function S th , describing the interaction between the neutrons and the bacteria, and the instrumental resolution, R(q, E), which is determined by a vanadium sample, a completely elastic and incoherent scatterer (1): where e − E 2k B T is the detailed balance factor.
Since proteins represent ≈ 55% of the bacterial dry weight and have a higher percentage of hydrogen content (50%) than any other type of macromolecules (30%) (2), the major contribution to S th is originating from self-diffusive dynamics of an average protein and of the bulk water present in the sample (in this case D 2 O): S th (q, E.T ) = S AP (q, E, T ) + ϕ · S D2O (q, E, T ), (2) where S AP (q, E, T ) and S D2O (q, E, T ) are, respectively, the scattering functions of the average protein and the D 2 O, and ϕ is a scalar factor that weights the contribution of the solvent. ϕ can be estimated by the product of the D 2 O amount in the sample, which constitutes about 80% of the sample mass, and the percentage of bulk water molecules in E. coli, which is almost 60%.
Fitting procedure. To mitigate the problem of overfitting due to the complexity of the system and the consequent high number of unknown parameters necessary to describe the resulting QENS signal, we tried to reduce, as much as possible, the number of free parameters that can vary during the fit. To this end, we used the measurements of the PBS-D 2 O buffer to fix the q-dependence of the solvent's intensity S D2O in eq. (2). Moreover, we employed a three step procedure for the analysis, where we performed at each step a fit of the data and we used the resulting information to improve the model and to reduce the number of free parameters. An example of the results from this procedure is shown in Fig. S1A.
Energy resolution The instrumental resolution function R(q, E) takes several parameters into account and its expression strongly depends on the set-up of the instrument. In particular, on the IN16b spectrometer the resolution function is well described by a sum weighted by the scalar parameter F (q) of a Gaussian and a Lorentzian function (3): (3) where |F (q)| ≤ 1. B(q) reflects a background which can depend on q. Figure S1B shows the fit of the measured Vanadium data with the eq. (3) and the measured energy resolution of the instrument, ∆E FWHM , corresponding to the Full Width-Half Maximum (FWHM) of eq. (3). The resulting parameters averaged over q are reported in table S1. Solvent contribution. To simplify the fit of the E. coli data, we first analyzed the QENS data of the PBS-D 2 O buffer at three temperatures: 276 K, 300 K, and 324 K to obtain an estimation of the q-dependence of the solvent intensity. The main contribution comes from D 2 O bulk water whose scattering function can be well represented as follows (4): where β D 2 O is the amplitude of the signal, and L γ D2O is a Lorentzian function that describes the diffusive translational motions of the D 2 O molecules. An example of the resulting parameters is reported in Fig. S2. The D 2 O cross section comprises 79% coherent and 21% incoherent scattering and, applying Vineyard's convolution approximation, we have S D2O (q, E = 0) ≈ S (coh) D2O is the coherent static structure factor (5). Hence, S D2O (q, E) has as main contribution the coherent scattering of D 2 O (6). Note that the experimental intensity of S D2O (q, E) (Fig. S2) reproduces qualitatively well the shape of S (coh) D2O (q) measured by Bosio et al. (7).
Average protein contribution. Protein dynamics at all the temperatures has been described in terms of two distinct diffusive processes arising from the dynamics of the average protein in the E. coli cytoplasm (See Fig. S1A), whose narrow and broad Half Width-Half Maximum (HWHM) of the signal γ G and γ L correspond, respectively, to the long and the fast characteristic timescales of the probed global (G) and local (L) motions. The dynamic scattering function of the proteins is modeled here by the following expression (2,8): where I(q, T ) is the intensity of the signal, which is related to small vibrations within the proteins, meanwhile L(E; γ G (q, T )) and L(E; γ L (q, T )) are two Lorentzian functions accounting for the diffusive contributions due to global motions (roto-translations of the entire proteins) and local dynamics (internal motions of sub-parts of the proteins, e.g. conformational changes) mentioned above. Finally, A 0 (q, T ), which multiplies the delta function δ(E), represents the Elastic Incoherent Structure Factor (EISF) containing information on the geometry of the internal motions that are confined within the space-time window of the spectrometer (1). Data analysis. As described in the previous paragraphs, we employed a three step procedure for the fit of the data.
In the 1st step, we performed a simple fit that treated the measured spectra independently, apart from the weighting factor of the solvent contribution ϕ, which was shared among the spectra since it should depend only on the amount of bulk D 2 O in the samples. On the contrary, all the remaining parameters could vary both with q and T . This allowed us to study the trends of the Lorentzian widths γ i (q, T ) as function of q 2 , and we verified that, both for the global and the local motions, they are well described by a jump-diffusion model as already observed in previous in vivo QENS experiments on bacteria (2, 9, 10) (see Fig. S3 and S4): Within this model, proposed by Singwi and Sjölander (11), the diffusion is assumed to occur via infinitely small, elementary jumps characterized by a negligible jump time and the residence time τ i , i.e. the time a proton spends in a given position. This is consistent with results of previous in vivo QENS experiments on bacteria (2,9,10). In the 2 nd step, we performed a simultaneous fit of the spectra measured at different q-values assuming a priori the jump-diffusion model for both the global and the local dynamics, i.e. constraining the q-dependence of γ G and γ L by eq. (6). The resulting values of the diffusion coefficients and the residence times for the global and the local dynamics are reported in Fig. S5.
Around the cell-death temperature (T CD ≈ 323.15K), one sees an important slowdown of the entire protein described by D G . It is possible to model the behavior by the function, which links the diffusion slowdown to the gelation of the system induced by protein unfolding (8,12): with: where a QENS u is a smeared step function describing the transition between the initial liquid state and the final gel-like state, that can be considered as a dynamical estimation of the apparent fraction of unfolded proteins in the E. coli cytoplasm T 0 corresponds to the temperature of the transition between the two states.
In the 3rd and last step, we constrained the temperature dependence of D G by eq. (7), and we performed a simultaneous fit of all the measured spectra. The resulting parameters, which are constant in T and q, are reported in table S2. Note: Fig. 1D of the main text shows the D G (T ) obtained with eq. (7) using the shared parameters reported in table S2.
An example of the resulting parameters for the EISF, A 0 (q), at five selected temperatures is shown in Fig. S6 (left). The EISF contains information on the geometry of localized motions within the protein. In particular, we found that the EISF can be well described by the model proposed by Grimaldo et al. in a previous study on γ-globulin (13): where: p L is the fraction of H-atoms appearing fixed on the accessible time scale of the instrument. For the remaining H-atoms, two types of motions were taken into consideration: confined diffusion in an impermeable spherical volume of radius r L , described by the amplitude A sph , and random jump diffusion between three equidistant sites on a circle of radius a M , arising from methyl-groups, and described by A 3JD , a M being fixed to 1.715Å, which is the average distance of H-atoms in methyl groups. The relative contribution of these two diffusive processes is measured by s L , which, more specifically, describes the fraction of Hatoms that, among those atoms that are not appearing fixed (i.e. 1 − p L ), are undergoing the spherically confined diffusion. j 0 and j 1 are Bessel functions of the first kind. The dashed lines in Fig. S6 (left) are the best fits of the EISF with (9). The parameters are reported in S6 (right). Comparing the fraction of H-atoms that are fixed with those that diffuse in a sphere or undergo a 3-jump diffusion (see Fig. S7), we find that the percentage of H-atoms that are diffusing in a spherical volume is constant in temperature. Therefore, the changes in the fraction of fixed atoms discussed in the first paragraph of the results section are due to the variation of the fraction of H-atoms undergoing the 3-jump diffusion, only.

Model preparation for simulations
Protein composition. To represent the protein composition of the E. coli cytoplasm, we built our simulation system on the basis of a previous computational model (14), which was derived from the results of a proteomics study (15) of E. coli grown under minimal media conditions. The computational model reported in (14) contained 45 different protein species as well as 5 types of RNA and RNA-protein complexes. For the purposes of the present study and in order to permit back-mapping of smaller sub-volumes into an all-atomistic resolution, we reduced the model in the following aspects. First of all, we considered a smaller cytoplasmic volume than (14), namely a 400Å cubic box. Second, we focused exclusively on proteins since they are known to form the most abundant macromolecular type in the E. coli cytoplasm and since the second most abundant macromolecular type-the RNA-is mainly concentrated in large ribosomal particles, which we did not include in our model. Third, we only considered protein species with molecular weights below 150 kDa to avoid large protein oligomers that would not fit easily in back-mapped sub-volumes. We assumed the same target macromolecular concentration as (14), that is, 275 g/L. Analogously to (14), the number of copies of each individual protein species was based on the relative abundances reported in (15). From the list of proteins simulated by (14), we omitted those counting less than 0.5 copies per our simulation box. In addition, we did not include the GltD, Hns, Pnp, and Bcp proteins, each of which would only exist in one or two copies in our simulation box, owing to the absence of reliable structures for homology modeling. On the other hand, for computational investigations of protein unfolding that are not reported in this work, we included 10 small monomeric barrels of superoxide dismutase 1 (SOD1) (16), raising the total macromolecular concentration to 279 g/L. Overall, the system comprised 197 proteins of 35 species (see Table S3).
Obtaining protein structures. Where available, we used an E. coli PDB structure to prepare an atomistic model of each protein species. Unless we found a more recent and higher-quality structure in the PDB database (17), we selected the same PDB structure as was used in the previous work (14). In three cases, we made use of a homology model stored in the SWISS-MODEL repository (18) as a starting point of the model preparation. Where needed, we subsequently added missing residues and corrected non-proline cis peptide bonds using the MODELLER software, version 9.23 (19). The quality of the structure was checked with the MolProbity server (20). When present in the PDB structure, metal ions coordinated to the protein were included in the model; however, larger ligand molecules were omitted. With the exception of TufA and GpmA, the oligomerization state of the protein was taken to be the same as in (14). While TufA was modeled as a dimer by (14), the prediction of the PDBe PISA server (21) yielded an ambiguous result, and the protein was described as a monomer in a previous experimental work (22). Therefore, we modeled TufA in a monomeric state. Similarly, while the GpmA protein was simulated in a dimeric form by (14), the PDBe PISA prediction placed it in a grey zone, and according to the EcoCyc database (23), the ATP-free version of the protein was monomeric. As a result, we considered GpmA in a monomer state.
The atomistic protein structures were hydrogenated by the GROMACS pdb2gmx tool (24), placed in rectangular boxes with a minimum distance of 1.5 nm between the protein and the boundary of the box, and solvated in a 150 mM KCl solution, including several additional ions to neutralize the net charge of the system, if needed. Each system was then subjected to a short energy minimization and equilibration protocol, performed in the GROMACS 2018.7 software (24). First, an energy minimization was performed to bring the maximum force below 1000 kJ mol −1 nm −1 , while harmonic restraints were applied to all heavy atoms of the protein (with a force constant k BB = 1000 kJ mol −1 nm −2 assigned to the backbone atoms and a force constant k SC = 500 kJ mol −1 nm −2 applied to the side-chain atoms). This energy minimization was followed by a six-step relaxation protocol with gradually decreasing harmonic position restraints, where the first two short simulations were performed in the N V T ensemble and were followed by four N P T trajectories simulated at a pressure of 1.01 bar (see Table S5 for more details). The minimization and equilibration protocol was performed separately for the Amber a99SBdisp (25) and CHARMM36m (26) force fields. The same simulation parameters were used as for the respective a99SB-disp and CHARMM36m systems described in the subsection All-atom simulations, except for the temperature coupling, which was ensured by the Berendsen thermostat (27) with a time constant of 1 ps. The equilibrated protein structures obtained with the a99SB-disp force field were converted to the OPEP description by the OPEP File Generator (http://opep.galaxy.ibpc.fr).
Initial positions and orientations of the proteins in the simulation box were generated by the Packmol software, version 18.169 (28). To make the spatial distributions of different protein species more uniform, each protein entity was treated separately in the Packmol input.

LBMD simulation
We performed a coarse-grained simulation of the large system using the lattice Boltzmann molecular dynamics (LBMD) approach (29,30), implemented in-house in the MUPHY software (31). Our LBMD scheme, which has been successfully applied to a number of biological systems (32)(33)(34)(35), combines a coarse-grained protein model with a lattice-based description of hydrodynamic interactions (30). The protein model was based on the OPEP v.4 force field (36). Since the goal of this coarse-grained simulation was to sample different intermolecular arrangements rather than simulate protein conformational changes, the conformations of the proteins were restrained by an elastic network (distance cutoff 6Å, force constant 5 kcal mol −1Å −2 ). The primary focus on intermolecular interactions also allowed us to simplify the description of the protein backbone, which was exclusively represented by Cα beads, the sizes of which were increased by 50 % relative to the standard OPEP v.4 model. Finally, the OPEP sidechain-sidechain non-bonded interactions were rescaled by a factor of 0.857 to mitigate excessive aggregation (37).
The simulation was performed in an N V T ensemble at T = 300 K, using a time step of 10 fs for bonded interactions and a time step of 20 fs for non-bonded and hydrodynamic interactions. A trajectory length of 4.3 µs was reached. Hydrodynamic interactions were described using the lattice Boltzmann (LB) technique (38), employing the BGK (Bhatnagar-Gross-Krook) collisional operator (39), with a lattice grid spacing of 4Å. The LB kinematic viscosity ν 0 was set to reproduce bulk water behaviour at ambient conditions. The coupling coefficient γ was determined for each protein species as a function of its molecular weight (see Figure S14). The dependence of γ on molecular weight was derived by performing separate simulations of six protein species (namely CspC, SOD1, Adk, TufA, MetE, and GapA) of different molecular weights in dilute conditions and by adjusting the γ of each species to match the diffusion coefficient of each isolated species with a prediction obtained with the HYDROPRO software (40).

Sub-volume selection and back-mapping
To explore the diffusion of proteins with the all-atom resolution, allowing the description of conformational flexibility and unfolding, we selected five sub-volumes sampled in the LBMD trajectory and converted them into the all-atomistic resolution.
The selection of the sub-volumes proceeded in the following way. A cube with an edge length of 170Å was placed in randomly chosen trajectory frames over the first 3 µs of the LBMD trajectory, with random positions and orientations (60,000 trials). In the dense environment of the cytoplasmic system, it was practically impossible to find sub-volume positions and orientations so as to avoid cutting at least a few proteins by the sub-volume boundary. For each placement, proteins that were fully contained in the sub-volume were counted as well as those that were cut by the sub-volume boundary. Subsequently, we isolated 120 sub-volumes that minimized the ratio of the total mass of proteins cut by the sub-volume boundary versus the total mass of proteins entirely placed inside the sub-volume. If we had only kept those proteins that were not cut by the sub-volume boundary, we would have systematically underestimated the protein concentration in the sub-volumes. Therefore, for each of the 120 sub-volumes, the proteins that were integrally inside the sub-volume were complemented by a subset of proteins that were cut by the boundary. This subset was constructed in the following way. First of all, we only retained proteins the periodic images of which did not overlap with any protein that was integrally placed inside the sub-volume. An overlap was defined as a close contact between two protein beads, with a distance less than 3Å. Next, we checked for overlaps among the proteins that we retained in the previous step, and we removed, one by one, proteins that exhibited the largest number of overlaps with the remaining members of the subset.
In this way, we ended up with a subset of proteins that had no overlap with any other protein in the sub-volume.
Out of the 120 sub-volumes constructed with the procedure described above, we selected a representative set of five sub-volumes that 1) approximated the distribution of local protein concentrations inside the large volume and that 2) maximized the diversity of protein species represented in the set while approximating the number distribution of protein species in the large volume. The distribution of local protein concentrations (see Figure 2B) was obtained by randomly placing a 170Å cube over the trajectory (2000 trials) and by evaluating the protein concentration inside the cube for each placement.
The protein concentration and species composition of each of the five cubic subvolumes, sampling the structure and local concentration of the crowded solution, is indicated in Table S4.
Each sub-box, with a size of 170Å and containing between 11 and 20 proteins, was subsequently converted into the all-atom resolution using the following back-mapping protocol. Atomistic structures prepared as described in the paragraph Obtaining protein structures were overlapped with the geometries from LBMD by aligning the Cα atoms of the atomistic structures with the corresponding Cα beads. We checked the back-mapped configuration for the presence of entangled aromatic side chains and corrected this artifact if needed. The boxes were subsequently hydrated, and a 150 mM concentration of K + and Cl − ions was added, including extra ions neutralizing the net charge of the proteins. The systems were then energy-minimized and equilibrated using a protocol described below.

All-atom simulations
Force field and simulation parameters. All-atom molecular dynamics (MD) simulations were performed using the GROMACS 2019.4 software (24). We employed two distinct sets of force field parameters to describe the proteins: Amber a99SB-disp (25) and CHARMM36m (26), which belong to the most recent generation of protein force fields and which were optimized to correctly capture the properties of unfolded ensembles. The a99SB-disp protein force field was coupled with the a99SB-disp water model, S8 while CHARMM36m used the TIP3P water model (41). In both cases, K + and Cl − ions were described with the default parameters for the respective force field. A 1.2 nm cutoff was applied to short-range non-bonded interactions. In addition, van der Waals forces were smoothly switched to zero between 1.0 and 1.2 nm in the CHARMM36m simulations. Long-range electrostatic interactions were evaluated using the particle mesh Ewald method (42). Newton's equations of motion were propagated using the leap-frog algorithm (43). The LINCS algorithm (44) was used to constrain the lengths of all protein bonds involving hydrogen, and the SETTLE algorithm (45) was employed to keep water molecules rigid.
The temperature of the system was maintained by the velocity rescaling thermostat with a stochastic term (46), which was coupled separately to the proteins and to the rest of the system with a time constant of 0.1 ps. The pressure was kept at 1.01 bar by the Parrinello-Rahman barostat (47) in all the production runs and by the Berendsen barostat (27) in all the N P T equilibration steps, with the time constant equaling 2 ps for both barostats.
Sub-volume equilibration. The procedure that we employed to equilibrate the atomistic sub-volumes at different temperatures and both in folded or unfolded states is detailed in Figure S15). First, the configurations back-mapped from LBMD underwent a short initial relaxation ( Figure S15B). Subsequently, a heating protocol ( Figure S15C) was used to equilibrate folded sub-volumes at increased temperatures, ranging between 310 and 380 K. In parallel with this, an unfolding protocol ( Figure S15D) and a subsequent cooling protocol ( Figure S15E) were employed to produce unfolded sub-volumes and equilibrate them at different temperatures in the range between 300 and 380 K. To efficiently heat up or cool down the system in the heating and cooling protocols, a heating sequence was used (48), enabling us to quickly reach the desired temperature while allowing the system to equilibrate. To follow the unfolding of the proteins during the unfolding protocol, we measured the root-mean-square deviation (RMSD) of each protein's atoms, its radius of gyration (R g ), as well as the amount of secondary structure present in the system, as determined by the DSSP software (49, 50) ( Figure S10).
Preparation of partially unfolded sub-volumes. The partially unfolded sub-volumes were derived from the same initial configuration as the 288 g/L sub-volume (Table S4). The proteins selected for unfolding are listed in Table S6. The unfolded fraction r u was quantified by computing the fraction of hydrogen atoms belonging to the unfolded protein structures. The sub-volume preparation followed a protocol equivalent to that employed to obtain fully unfolded sub-volumes ( Figure S15). In addition, the positions of the atoms in proteins deemed to remain folded were kept frozen during the unfolding protocol (Figure S15D) as well as during the three pre-cooling stages of the cooling protocol ( Figure  S15E). Moreover, before the subsequent 100 ns N P T equilibration at 360 K, the frozen folded structures were gradually relaxed using the same six-step N V T equilibration as in S9 the initial relaxation protocol ( Figure S15B); however, in contrast to the initial relaxation, the decreasing harmonic position restraints acted on the atoms of the folded structures only.
Production simulations. The production runs following the equilibration procedure were performed at eight different temperatures (300, 310, 320, 330, 340, 350, 360, and 380 K) both for folded and unfolded sub-volumes with a time step of 2 fs and each reaching a trajectory length of 102.4 ns. The translational velocity of the center of mass of the whole system was removed every 100 steps. For the 288 g/L sub-volume simulated using the CHARMM36m force field, the production trajectories performed at T = 330 K were extended to 1 µs for further analysis (see Figure S12).

Calculation of D t and D r
Obtaining average D t per sub-volume. To evaluate the average protein translational diffusion coefficients D t for a given sub-volume, folding state, and temperature, the production trajectory was divided into 20 ns blocks. In each block, we first calculated the mean squared displacement (MSD) of the center of mass of each protein molecule. We then computed the average of these MSD curves weighted by the numbers of atoms of the proteins (see Figure S16). This average MSD curve was subsequently fitted with a straight line in the 0.3-5 ns regime, and D t was extracted from the slope of the fit. Subsequently, we determined the average of D t over the 20 ns blocks as well as the corresponding standard error of the mean. For the given sub-volume, folding state, and temperature, the resulting D t was corrected for the effects of periodic boundary conditions (PBC) (see the Subsection Correcting D t and D r for PBC effects). The estimated error of the PBC correction was then added to the standard error of the mean determined from the block averaging to express the uncertainty of D t . To obtain a final value of D t characterizing the entire cytoplasm, we calculated an average of the PBC-corrected D t 's over all the sub-volumes, with the individual D t 's being weighted by the numbers of protein hydrogen atoms in the sub-volumes. An analogous weighted average of the error bars was performed to estimate the uncertainty of the final value of D t .
Calculating D t and D r for the evaluation of D G . To compute the apparent diffusion coefficient D G , a quantity directly comparable with the experimental observable (see the Subsection Evaluation of D G ), it was necessary to calculate D t and the rotational diffusion coefficient D r separately for each individual protein chain. The reason for considering individual chains rather than entire protein molecules lies in the fact that upon unfolding, the different subunits of a protein may dissociate, and as a consequence, the evaluation of D r for the entire molecule would no longer be meaningful.
For each protein chain, D r was computed by using the following approach (51,52), approximating the chain rotation as being isotropic. One thousand randomly distributed unit vectors were centered and rotated along with the protein chain. This was done-for each frame-by fitting the geometry of the protein chain with a reference structure to which the fixed orientations of the unit vectors had been set. An autocorrelation function of the unit vectors' directions was then calculated by employing the gmx rotacf tool (24) and using the second-order Legendre polynomial for evaluating the autocorrelation. The resulting autocorrelation curve was fitted with an exponential function in the 0.3-5 ns regime to obtain the decay time τ which was subsequently used for calculating D r as D r = 1/(6τ ).
To achieve the best possible convergence, the single-chain D t and D r were calculated from the entire trajectory without cutting it into multiple blocks.

Viscosity calculations
We estimated the low-frequency, low-shear viscosity η of the 288 g/L sub-volume (see Table S4) at three different temperatures (300 K, 320 K, and 380 K) for both the foldedand the unfolded systems. We also considered the intermediate unfolded systems with 25 %, 50 %, and 75 % of unfolded proteins (Table S6) at T = 300 K. All the calculations were repeated for both sets of force field parameters employed in this study.
To calculate η, we followed a similar approach as was used in (53). For each system and temperature, we extracted 30 snapshots (50 for the unfolded CHARMM36m system at 300 K) in 1 ns intervals from the final part of the respective production run. Each of those snapshots served as a starting point of a 1 ns N P T equilibration followed by a 10-30 ns N V T simulation (10 ns for the folded systems, 20 ns for the rest of the a99SB-disp systems, and 30 ns for the remaining CHARMM36m systems). The N V T simulation was used to sample the elements of the pressure tensor, which was saved in 10 fs intervals. The viscosity was calculated from an integral of the autocorrelation functions of the pressure tensor fluctuations. Namely, for each off-diagonal element of the pressure tensor (P ij = P xy , P xz , and P yz ) and for each of the three combinations P ij = (P xx − P yy )/2, (P xx −P zz )/2, and (P yy −P zz )/2 of its diagonal elements, we computed the running integral where V represents the volume of the simulation box. For each ij, the mean of η ij (t) over all the N V T simulations was fitted with an analytical function corresponding to a bi-exponential decay of the autocorrelation function: where A, B, α, τ 1 , and τ 2 were parameters of the fit. The cutoff time t cut for the fitting was determined as the time where the standard error of the mean of η ij (t) for the given ij exceeded 20 % of the mean. Examples of η ij (t) and the fits can be found in Figure   S11 S20. The η ij was then calculated as the limit of η ij (t) at infinity. Finally, the value of η was obtained as the mean of the six η ij values, and the uncertainty of η was determined as the standard error of the mean of the six η ij values.
To obtain a dilute reference, we also calculated the viscosity of an aqueous solution of 150 mM KCl considering the two respective water models-the a99SB-disp water model and the TIP3P model-and the same three temperatures as above. In each case, we performed a 100 ns N V T simulation of a 5.1 nm cubic box after a short energy minimization and a 1 ns N P T equilibration. We split the simulation into 10 ns blocks and computed the autocorrelation functions ⟨P ij (0)P ij (t)⟩ separately for each block. Next, making use of the isotropy of the KCl solution, we averaged the autocorrelation functions over ij for each block and calculated the running integral η(t) from the average. Subsequently, the average of η(t) from all the blocks was fitted-using a 10 ps cutoff-with the same analytical function as shown in eq. (12). Analogously to the case of the crowded sub-volumes, η was computed as the limit of the analytical function at infinity.
To estimate the correction to the diffusion coefficients due to periodic boundary conditions, we interpolated the viscosity to intermediate temperatures and extrapolated it to different protein concentrations by using analytical expressions derived in (54). First, we interpolated the temperature dependence of the solvent viscosity η 0 (T ) using the following expression (54): The curves resulting from these interpolations are shown in Figure S17, and the corresponding values of the parameters B w , D w , and ∆E w are listed in Table S7. As a next step, we interpolated the viscosities of the crowded system to different temperatures using the relationship (54): In this expression, c denotes the protein concentration while B, D, and ∆E are free parameters. The parameter α is a function of the protein molecular weight M p , the molecular weight of water M w , and its density ρ w : We set M p to 54.66 kDa, i.e., to the average molecular weight of the proteins in our model. The parameter β was defined as: where we set ν to a value that was determined previously for BSA, ν = 1.417 x 10 −3 m 3 /kg (54) The parameters B, D, and ∆E obtained for each force field and for both folded and unfolded systems are listed in Table S8, and the resulting η(T, c) functions for c = 288 g/L are displayed in Figure S18A-B. As illustrated in Figure S18C, the analytical function η(T, c) defined by eq. (14) allowed us to extrapolate the dependence of viscosity on concentration for a given T .
To estimate the uncertainty of the interpolated/extrapolated viscosity values, we repeated the fitting of eq. (14) with random combinations of viscosities each sampled within the confidence interval at the given temperature.

Correcting D t and D r for PBC effects
To correct the translational diffusion coefficients for PBC effects, we added the following term (55) to D t computed from simulations:

Evaluation of D G
In order to directly compare simulations with experiments, we estimated each protein's apparent diffusion coefficient D G , which takes into account not only the translational motions, but also the rotations of the entire protein. To this end, we used the following relation that links D G to D t and D r (4,57): where B n (q) is: and j n is the spherical Bessel function of order n and ρ H is the radial distribution function (RDF) of the protein hydrogen atoms.
After the calculation of D t and D r for each protein (as described in the previous section), we fixed N = 550 (4) and we solved numerically eq. (19) for three values of q: 0.2Å −1 , 1Å −1 , and 2Å −1 , verifying that, in the q-range explored by the experiment (0.2Å −1 ≤ q ≤ 2Å −1 ), the value of D G is constant. Indeed, if N and q are big enough, it is possible to show that the results do not depend on those parameters (4).
In this way, we obtained the D G of each protein in the folded and the unfolded configuration at all the temperatures. Then, aiming to compare these data with the QENS results, for each system we calculated the average value of D G by weighting each protein's contribution with the number of its hydrogen atoms. Figure S22 shows the results for the five selected sub-volumes extracted from the larger cubic box used for the LBMD simulations.
To obtain the value of D G at intermediate folding states of the cytoplasm, we used the following relation: For further details on the validity of this approach, see the following subsection "Connection between the apparent and the real unfolded fractions". The resulting values of D G , for CHARMM36m, are reported in Figure S23.
Finally, in order to calculate the most representative value of D G for the average protein in the E. coli cytoplasm, we considered the probability distribution p(c) for the selection of a sub-volume of (17 nm) 3 and concentration c from the larger system with a volume of (40 nm) 3 used in the LBMD simulation as representative of the bacterial cytoplasm; see Figure S24A. The weight of each sub-volume ( Figure S24B) was calculated as follows: The results, obtained with both force fields, are shown in Figure 4B.

Connection between the apparent and the real unfolded fractions
As we detailed in the subsection "Neutron scattering data analysis", we found that D G measured by QENS experiments can be described as: where a QENS u (T ) is a smeared step function parameterized by the temperature of transition T 0 and the width of the transition ∆T : The function a QENS u (T ) weights the transition between two different dynamical regimes characterized by the two extremal diffusion coefficients D 1 (T ) and D 2 (T ), and since D 2 (T ) < D 1 (T ), a QENS u (T ) is a measure of the diffusion slowdown. At low temperatures S14 (T ≪ T 0 ), the parameter a QENS u tends to zero; therefore, D G (T ) ≈ D 1 (T ) = T · a 1 + b 1 . At high temperatures (T ≫ T 0 ), the smeared step function is close to one, and D G (T ) ≈ D 2 (T ) = T · a 2 + b 2 . However, the observed dynamical transitions is not reversible-after the transition, D G (T ) remains equal to D 2 (T ), even at low temperatures. The same dynamical behavior was previously found for proteins in solutions and, due to the similarity of the systems, this suggests that the underlying process is the same. In particular, in the first state (state 1), the E. coli cytoplasm is a liquid where the majority of proteins are folded. After the transition (state 2), the cytoplasm is in a gel state, and the proteins are mainly unfolded. For simplicity, we will assume that in states 1 and 2, all the proteins are fully folded and fully unfolded, respectively. Therefore, the diffusion coefficient D 1 will describe the global diffusion of the folded average protein in the liquid state, termed D (f ) , whereas D 2 will be related to the global diffusion of the fully unfolded average protein in the gel state, termed D (u) . Thus, for D G (T ) we have: In this picture, a QENS u (T ) can be interpreted not only as a measure of the diffusion slowdown during the unfolding, but it also represents the apparent fraction of unfolded proteins that weights D (f ) and D (u) and that is necessary to reproduce the values of D G for systems in a partially unfolded state. In particular, our simulations suggested that the relation between the apparent fraction of unfolded proteins determined dynamically, a MD u , and the real fraction of unfolded proteins, r MD u , can be described by a power law: Therefore, to estimate the real unfolded fraction as a function of temperature starting from the dynamical QENS results, we can use the inverse relation: In conclusion, comparing QENS experiments and MD simulation, first we verified that the simulations can reproduce the linearity of D (f ) (T ) = D 1 (T ) = T · a 1 + b 1 , and D (u) (T ) = D 2 (T ) = T · a 2 + b 2 ( Figure 2D and S11A). Then, since protein unfolding is computationally demanding for all-atom MD simulations, instead of calculating extremely long trajectories to allow all proteins to unfold, we forced the unfolding (as described in Subsection "Preparation of partially unfolded sub-volumes"), and we performed a series of relatively short simulations (∼100 ns), sufficiently long to determine the diffusion coefficient in the timescale explored by the QENS experiments, but brief enough to ensure that the folding state of the proteins was not affected. This allowed us to verify that if the folding state remains constant, the diffusion coefficient of the average protein scales linearly with the temperature (Figure 3D and Figure S11 panel C), and this relation also holds for systems with different concentrations ( Figure S22).
Comparing the diffusion coefficients calculated for a system simulated in different folding states ( Figure 3D and S11C), we found a relationship between the amount of unfolded proteins and the consequent diffusion slowdown-eq. (26). Finally, with the inverse relation (eq. (27)) and the QENS measurement of the slowdown, we were able to estimate the real fraction of unfolded proteins at different temperatures in the E. coli cytoplasm through the study of its dynamical state (see Figure 4A).

Supplementary Results
Origin of the nonlinearity of the transfer function The observation that the translation diffusion coefficient D t cannot be expressed simply as where r u is the unfolded fraction and D are the translational diffusion coefficients corresponding to a fully folded-and a fully unfolded system, respectively, can be rationalized in the following way. If we assume the validity of the Stokes-Einstein relation, the diffusion coefficients D t , forming two limiting cases, can be written as D and D Here, R (f ) and R (u) are protein hydrodynamic radii in the folded and unfolded states, respectively, and η (f ) and η (u) denote the viscosities of the fully folded and fully unfolded volumes. However, in intermediate boxes with an unfolded fraction r u , η(r u ) is neither identical to η (f ) nor to η (u) . Consequently, for the diffusion of folded/unfolded proteins in the intermediate boxes, we can expect and D (u) where the viscosity is a function of r u . Therefore, rather than assuming eq. (28), one should take into account the effect of the intermediate viscosity: If we consider a linear dependence of η on r u , with values of η (f ) and η (u) estimated using simulations, we obtain a similar dependence of a u on r u as the one based on the actual diffusion coefficients determined from simulations (see Figure S25).         Table S3. Figure S9: Proportion of initial protein neighbors of an average protein that remain in contact with the protein in the course of the LBMD trajectory. Two proteins were considered in contact if the shortest distance between their beads was below 7.34Å.           Table S6). Figure S20: The η ij (t) running integrals (eq. (11)) obtained for the 288 g/L CHARMM36m sub-volume at T = 300 K. The black curve shows the average of η ij (t) over all the N V T simulations, the gray stripe represents the standard error of the mean, and the blue curve is a fit using the analytical function (eq. (12)). Figure S21: Ratio of the translational and apparent diffusion coefficient for the average protein chain at different temperatures in folded and unfolded sub-volumes with varying protein concentrations (force field: CHARMM36m).     Table S1: Average values of the parameters resulting from the fit of the Vanadium data with eq. (3).
Step    Table S9: Comparison of the estimations obtained with different approaches for the fraction of unfolded proteins r u (T ) in the E. coli cytoplasm at increasing temperatures, and the temperature T 50% needed to unfold 50% of the proteins. Models r u (T CD ) r u (T CD + 3K) r u (T CD + 5K)