# 2 Experiment analysis

## 2.1 Introduction

The previous chapter (see Chapter 1) detailed the steps necessary to extract data from a set of microfluidic images through image analysis techniques and fluorescence microscopy. Each step was instrumental in creating a dataset that was easy to explore and ask questions. With the help of computational biology, systems biology, and data analysis techniques, we could process these files to help us in the search to find the role of filamentation in cell survival.

Both the ideas and concepts of computational biology and systems biology
contributed to the development of this analysis. In principle,
computational biology originated after the origin of computer science
with the British mathematician and logistician Alan Turing (regularly
known as the father of computing).^{21} Over time, systems
biology emerged as an area that synergistically combines models and
experimental data to understand biological processes.^{22} Thus,
giving a step towards creating models that, in general, are
phenomenological but that sometimes serve to discover new ideas about
the process under study. Ideas and aspects of the study of biological
sciences that otherwise could be unthinkable without the computer’s
power.

Here, we divide the experimental analysis into two main parts: 1) at the cell level or measurements at specific points in time and 2) at the population level and time series. The first level allowed us to identify the individual contribution of each variable understudy to determine cell survival. The second level allowed us to understand how the population behaves according to the passage of time in the face of exposure to a harmful agent (in this case, beta-lactam antibiotics). Together, both visions of the same study phenomenon allowed us to extract the main ideas for postulating a mathematical model that seeks to show how filamentation is a factor for cell survival in stressful environments (see Chapter 3).

## 2.2 General preprocessing of data

The raw data processing consisted mainly of creating two levels of observation for the cells of both chromosomal strains and multicopy plasmids. The first level is at a cell granularity, that is, point properties. The second level consists of the cells over time, thus observing properties at the population level. We did this because it would allow us to understand what factors are affecting filamentation and why.

We normalized the fluorescence values of DsRed and GFP for both experiments based on the values observed before exposure to antibiotics. It allowed us to have a basis to work with and compare expressions between cells. In the case of DsRed environment drug concentration, we also applied a logarithmic transformation to observe subtle changes in fluorescence intensity that would allow us to detect cell dead.

Ultimately, we decided to classify cells into four fundamental groups
based on whether the cell filamented and survived (see Figure
2.1). We
define a *filamented cell* as a cell with more than two standard
deviations from the mean concerning the lengths observed before
introducing antibiotics into the system. On the other hand, although
there are multiple ways to define death from single-cell observations,^{23} we considered a *cell dead or missing*
when we stopped having information about it, either because of
fluorescence in the red channel was above a given threshold (resulting
from an increase in cell membrane permeability and the introduction of
fluorescent dye into the cell) or because it left the field of
observation. Therefore, a *surviving cell* is defined as a cell observed
before and after exposure to the antibiotic and does not surpass the
DsRed threshold.

## 2.3 Results

### 2.3.1 Cell length and the amount of GFP are crucial in determining cell survival

We evaluated the DsRed, GFP, and length values for each cell at different time points: initial, filamentation, and end. This preprocessing allowed us to observe and quantify each cell at critical times in the experiment and eliminate noise or signals outside the scope of this investigation.

We define the *initial time* as the first time we observed the cell in
the experiment. *Filamentation time* equals when a cell reaches the
filamentation threshold (see Figure
2.4) for the first
time. We defined the *end time* as the time of the last observation of
the cell. We decided to bound the end time for surviving cells to one
frame (10 min) after the end of antibiotic exposure so that the observed
signal would reflect the final stress responses.

When we compared the distributions of DsRed, GFP, and length for both experiments, we observed its changes in its role for cell survival. In Figure 2.2, we show that indistinctly and, as expected, surviving cells managed to eliminate the antibiotic by the end time. In contrast, dead cells presented higher levels of antibiotics (measured by proxy through the mean DsRed intensity of the cell).

On the other hand, GFP observations in Figure 2.3 showed us two essential things for cell classification: 1) The chromosomal strain did not exhibit noticeable changes in GFP levels, and 2) filamented cells were those that had low fluorescent intensities (low plasmid copy-number) at the beginning of the experiment. For the final observation times, GFP measurements indicated that among the cells that did not filament, the ones that survived exhibited a reduced GFP expression concerning cells killed by the antibiotic. Meanwhile, for the filamented cells, whether surviving or dead, their GFP measurements indicated no difference at the beginning or the end of the experiment, suggesting the presence of other determinants of cell survival.

Cell length was one of the factors that GFP expression levels could not
explain for cell survival. In Figure
2.4, we show that the
conclusions regarding filamentation were applicable for both chromosomal
or plasmid strains. For the initial times, filamented and survived cells
were shorter in length than those that died but longer than not
filamented cells of both classes, while non-filamented cells did not
differ from each other. We observed no length differences between cells
at filamentation time. Thus, survival could depend on other factors,
such as growth rate. At the final time, the results were well-defined.
Surviving cells had a greater length relative to their non-surviving
pair (*i.e.*, dead filamented and non-filamented cells). However, for
filamented cells, surviving cells represent a distribution of higher
final length values in general but not as extensive as their dead
counterpart. Which we could explain as a length limit to which cells can
grow without dying. Nevertheless, we had no information to evaluate such
a hypothesis.

Once we observed the effects of GFP expression levels and lengths in
determining whether a cell lives or dies, we projected the cells onto
the plane and painted them with their class status (See Figure
2.1) to
determine whether these two variables contained the necessary
information to cluster the data correctly. In Figure
2.5, we show the initial GFP
and length values projection. While, with some work, we could
contextually place the results in Figures
2.3 and
2.4, the initial
values did not appear to determine the classes. Therefore, we explored
the final versus initial values differences in Figure
2.6. With this new
representation of the cells in the plane, we contextualized the
statistical results presented in Figures
2.3 and
2.4. Besides, it
showed us that differences in length (*i.e.*, filamentation) and
reductions in GFP expression are essential in determining cell survival.
Though, the clustering of cells by state is not completely separated,
which means that other variables are affection the experimental results
in cell survival.

### 2.3.2 Number of divisions and cell age do not appear to play a clear role in determining cell survival

In Section 2.3.1, we explored how GFP variability
and cell length influence cell survival. However, Figures
2.5 and
2.6 showed us the possibility of
other factors relevant to the phenomenon under study. As some papers in
the literature suggest, some of these other factors may be cell division
and chronological age (*i.e.*, how much time has passed since the last
cell division at the time of exposure to a toxic agent).^{24} Therefore, we chose
to observe these two metrics in experiments at a purely qualitative
level, i.e., without the inclusion of, e.g., metrics of membrane or cell
cycle properties.^{25}

Although we expected to see a small contribution, either by the number of divisions or cell age, in Figures 2.7 and 2.8, we could not observe a precise effect of these variables on cell survival. Patterns that, although they could have an explanation or biological significance, we decided to omit as relevant in the characterization of our cells, since the signal was not clear. However, we derived from this analysis a slightly simpler variable that tells us whether a cell underwent a cell division event or not. So it gives us a more generalized picture of the contribution of division to cell survival (see Figure 2.14).

### 2.3.3 Time to reach filamentation matters in determining cell survival

In Figures 2.2, 2.3, and 2.4, we showed how, at the time of filamentation, DsRed and GFP levels appeared indifferent to the cells. Therefore, we hypothesized that a possible variable that could determine cell survival could be its time to activate its anti-stress response system that causes filamentation. Furthermore, we also guided our hypothesis by previous reports showing us how the gene expression level can induce filamentation with tight temporal coordination [x].

While, for our analyses, we did not measure the concentration of antibiotic that triggers filamentation per se, we indirectly quantified its effect by using the time it took for a cell to reach a length at which it is already considered a filamentating cell. Furthermore, to recognize that the observed effect was a product of the experiment, we decided to keep only filamented cells just once antibiotic exposure began.

Figure 2.9 shows how filamentation times are narrower for chromosomal cells than for plasmid-bearing cells. Then, we hypothesize that the effect could come from the heterogeneity in the plasmid copy number in the population. Also, interestingly, we observed that, for both experiments, cells that survived had longer filamentation times than the cells that died. These differences in response times suggest the following: 1) if the cell grows too fast, it will reach a limit and start to accumulate antibiotics constantly, and 2) if the cell grows too fast, it is likely that the cost of maintaining an ample length for prolonged periods of exposure will become counterproductive.

In Figure 2.10, we decided to project the results of Figure 2.9 in a space similar to the one described in Figure 2.5. Thus, we separated our data into cells that survived and cells that did not, and painted them when it took them to reach their filamented state. We realized that, indeed, by adding this temporal component to the initial variables of length and GFP, we could separate surviving cells from dead cells to a greater degree. However, it may still not be enough, and there are still many other variables that play a crucial role in understanding the ecology of stress and how some cells will be survivors or not.

### 2.3.4 Increasing the complexity of the system and analyzing it in an unsupervised way allows a correct classification of cell states

In the experiments, we observed the importance of GFP filamentation and variability for cell survival. Similarly, we realized that other variables must be affecting the final results. Filamentation and GFP variability alone did not fully recapitulate the expected behavior of the data. That is, the target variables did not capture the heterogeneity of the system.

The inability to reproduce cell classification led us to question two
things: 1) the possibility that our sorting was wrong beforehand and 2)
we did not have enough variables to capture the study phenomenon. We
decided to take the unsupervised learning way to answer these subjects
because it allows us to project our data without a *priori* knowledge
[x].

We opted for the path of dimensionality reduction techniques where each variable or feature is equivalent to one dimension. The essence of dimensionality reduction is that it is not feasible to analyze each dimension with many dimensions. Furthermore, dimensionality reduction helps us counteract several problems such as reducing the complexity of a model, reducing the possibility of overfitting a model, removing all correlated variables, and, of course, visualizing our data in a two- or three-dimensional space for better appreciation [x]. Improved visualization and identification of essential variables are the main reasons to guide and complement our research with this technique.

#### 2.3.4.1 Principal Component Analysis (PCA) emphasizes the importance of cell length and its GFP in cell survival

The first dimensionality reduction technique we decided to use was
Principal Component Analysis (PCA).^{26}
Scientist mainly uses PCA to create predictive models or in Exploratory
Data Analysis (EDA). In our case, we only use it as an EDA.

For chromosomal and plasmid strain, in Figures 2.11 and 2.12, we show the projection of the first two principal components (PCs), respectively. Figure 2.11 separates the manually annotated classes, surviving cells separated from non-surviving cells. However, for Figure 2.12, the class separation was a bit rougher but allowed us to separate the surviving filament cells from the dead ones.

For their part, in Figures 2.13 and 2.14, we show the total contribution of each variable per PC for the chromosomal and plasmid strain, respectively. Finding that, indeed, filamentation plays a crucial role in determining cell survival. For example, for PC2, we appreciated how the variable end DsRed directed the dots to the positive side, while the variable end and start length directed the dots to the opposing side. Therefore, we can support that filamentation has a role in moving cells away from having higher amounts of DsRed.

#### 2.3.4.2 Uniform Manifold Approximation and Projection (UMAP) correctly represents the local structure of cell states

Staying with only a one-dimensionality reduction technique was not an
option, so we used the UMAP technique.^{27} We mainly
decided to use UMAP for clustering purposes and see if the annotated
clusters corresponded to the manually annotated ones. UMAP has certain
advantages for these purposes, e.g., it preserves the global structure
across the whole space, so the distances between clusters matter.

In Figures 2.15 and 2.16, we show how, using the same variables used in the “PCA” section, UMAP was able to cluster the four proposed classes correctly. Interestingly, in Figure 2.15, UMAP formed three general groups and four for Figure 2.16. However, in general, UMAP clustered the surviving cells from those that did not survived. On investigating why this separation occurred, we found that the large groups coalesced into one another if we eliminated the division variable. So, in a way, the division also has a role in determining survival, but it is not essential or at least not over-represented in our data.

### 2.3.5 Population dynamics reveal how filamentation contributes cell survival

From the full tracking dataset, we evaluated how the different cell states behaved over time—for example, understanding how the cells absorbed antibiotics or how they elongated in time. In contrast to the dataset generated in the 2.3.1 section, we did not truncate the results 10 minutes after the antibiotic exposure. In this way, we were able to observe cell behavior before and after the presence of the toxic agent.

In Figure 2.17, we observed a small
fraction of cells that were already filamentous without exposure to the
toxic agent in both cell strains. However, after the onset of antibiotic
exposure at minute 60, we observed increases in the proportion of
filamented cells. It is interesting to note how filamented cells grew
after antibiotic exposure for the chromosomal strain. We speculate that
this post-antibiotic growth exists because, once the SOS system that
triggers filamentation is activated, the system continues to grow until
it reaches a limit regardless of whether the damaging agent is still
present or not.^{28} Moreover, we observed how the cells start to divide again
after some time because the proportion of non-filament cells starts to
grow while the filament cells start to divide. We observed the same
effects for the plasmid strain. However, by experimental design, the
number of filament cells expected was much lower.

In Figure 2.18, we showed how once antibiotics exposure began, those cells that died had a much faster increase in DsRed than those that did manage to live, regardless of whether they filamented or not. On the other hand, surviving cells managed to maintain their DsRed levels relatively stable. We noted that length was critical for the surviving cells for the chromosomal strain by turning to the GFP and length variables for a temporal explanation. Even cells categorized as non-filamented reached the filamentation threshold minutes after antibiotic exposure. However, the distinction of live or dead filamented cells was not as evident as expected. As for cells with plasmids, the effect on GFP for surviving cells was maintained for filamented cells and decreased for non-filamented cells. For the filament cells that died, we showed that they had, on average, a much longer initial length than the surviving cells, so we also consider it as a necessary factor in understanding which variables affect cell survival.

### 2.3.6 Heterogeneity in plasmid copy-number allows various forms of survival in addition to filamentation

We are confident that filamentation has a fundamental role in
determining cell survival, with what we have shown so far. However, for
plasmid cells, we have a component that is of our complete interest;
heterogeneity. Each cell can possess a different plasmid copy-number;
thus, each could show a different behavior under stress.^{29} For instance, heterogeneity can produce resistant
cells that do not suffer damage, susceptible cells, and cells that form
filaments to mitigate environmental stress.

To study the effect of variability in plasmid copy number in the survival probability of the population, we decided to group cells by the proportion of initial GFP with respect to the population maximum. We defined 100% of the population as the number of total cells at the onset of antibiotic exposure. Figure 2.19 shows how the cells with the highest amount of GFP remained unchanged once antibiotic exposure began, while the rest of the cells started to decrease their percentage of surviving cells. However, the decrease was not linear. On the contrary, we observed a bi-modal distribution on the reduction of live cells. An average GFP point provided higher survival than a point below or above the average (except for cells very close to the population maximum).

Therefore, what we observed was a bimodal distribution for GFP-dependent cell survival. In order to show this effect more clearly, in Figure 2.20, we plotted the survival probability for each GFP bin without normalizing for the population maximum. This new plot allowed us to observe how the bimodal survival distribution occurs for cells that did not grow as filaments, whereas cells that filament increase their survival probability gradually as they have more initial GFP (see also Figure 2.3).

As in Figure 2.20, in Figure 2.21, we show the survival probability given an initial length. We observe that survival is higher for cells that did not grow as filament if the initial length was less than the average. In contrast, for filamented cells, the probability of survival increased as cells length was longer at the beginning of the experiment (see also Figure 2.4). However, it is noteworthy that the probability of survival had a limit in which a higher initial length meant a lower probability of survival (see red dotted lines in Figure 2.21).

## 2.4 Discussion

Here, we evaluated different variables that could determine cell
survival upon exposure to toxic agents by studying two experimental
populations of *E. coli*, one strain with a resistance gene on the
chromosome and the other on multicopy plasmids. We identified two
variables that are predominantly responsible for cell survival: cell
length and GFP amount related to the cell’s inherent resistance to the
toxic agent and heterogeneity in response times.

On the other hand, as other studies have already mentioned,^{30} we examined cell activity and youth in a
minimalistic way. While the distribution of the number of divisions
exemplifies a broader and more uniform range for the surviving cells,
the cells that died showed a tendency to a lower number of divisions.
However, for the study of cellular youth at the time of exposure to the
toxic agent, the results did not show a clear pattern of behavior for
cell fate determination. Therefore, it would be interesting to study
cellular youth at a higher level of complexity in future studies to
understand its contribution to cell survival.

Interestingly, when we used temporal measurements of cell length, GFP, DsRed, and if a cell divided, we could recapitulate, for the most part, the fates of cellular states (see Sections 2.3.1 and 2.3.4). Thus, increasing the system’s complexity led to better clustering cell states, but not how these factors interact biologically in determining cell survival. Therefore, we decided to postulate a mathematical model that helps us understand the critical components in cell survival.