Lecture by Nicholas Noll.
Bacteria sense the state of the outside world and adapt their behavior accordingly. Such adaptive behavior helps the cells to save resources since it can avoid expressing metabolic machinery that is currently not needed. It also helps the cell to prepare for the future. Cells sense each other (so called quorum sensing) and can for example "prepare for famine" when the space gets crowded and food is running out.
Such adaptive behavior of bacteria is usually mediated via specific receptors on the surface of the cell or intra-cellular concentration sensors that couple to the gene expression program. We have previously discussed that diffusion of proteins within bacterial cells equilibrates within seconds. Hence it is plausible to model this type of gene regulation in terms of uniform concentrations inside and outside the cell. Keeping it abstract, we can write down equations for concentrations \(x_i\) as $$ \frac{d x_i}{dt} = f_i({x_j(t)}, {\theta_j}) $$ where \(f_i\) is a function that determines the rate of change of \(x_i\) in terms of all concentrations \({x_j}\) and parameters \({\theta_j}\). Note that these equations define a Markovian system, i.e., the rate of change depends on the status quo and not the history of the system. Furthermore, these equations are of first order and no second derivatives with respect to time play a role. As such, it is in principle a straight forward matter to integrate them. However, there are two fundamental complications
- all but the simplest systems depend on a large number of parameters which are often poorly known. Fitting complicated systems to data is rapidly becomes ill-defined.
- the volume of a bacterium is small and the number of molecules often in the 10s or 100s. Stochastic effects not captured by the deterministic equation can be important.
There is no universal solution to the first of these problems. We will focus on very simple systems where this is less of an issue -- more generally finding a model that captures the essence of a problem with the smallest number of parameters is the most critical step in such an analysis (trying to be realistic and/or comprehensive is often a bad idea in this context).
The stochastic aspects of the dynamics can be addressed in a more principled manner. Instead of modeling concentrations, we model distributions of copy numbers \(p(n_i,t)\). The function \(f_i\) has to be decomposed into different parts that increase or decrease \(n\), that is production and degradation terms. As an example, this could look like this: $$ f_i({n_j(t)}, {\theta_j}) = \frac{n_k(t)}{1+\theta_l n_l(t)} - \theta_i n_i(t) = \mathrm{production} \quad + \quad \mathrm{degradation} $$ The first term is a production term that depends on the copy numbers of species \(n_k\) and \(n_l\). Step-by-step, this will move \(n_i\) to \(n_i+1\), \(n_i+2\), etc The second term is a degradation term that will take \(n_i\) to \(n_i-1\) etc. In some scenarios, the individual steps might be bigger and for example change copy numbers by two, but the basic idea is the same. After having identified the individual reactions, we can formulate an equation for probability distribution of copy numbers. For simplicity, we will look at a single species here and drop the indices and denote the production and degradation rates by \(\alpha\) and \(\gamma\). $$ p(n,t+\Delta t) = (1-\Delta t (\alpha +\gamma n))p(n,t) + \alpha p(n-1,t) + \gamma (n+1) p(n+1,t) $$ Slight rearrangement and taking the limit \(\Delta t \to 0\) turns this into a differential equation in time: $$ \frac{p(n,t)}{dt} = -(\alpha +\gamma n))p(n,t) + \alpha p(n-1,t) + \gamma (n+1) p(n+1,t) $$
Simulating stochastic trajectories
The ODEs for the probability distribution discussed above can be solved in simple low dimensional cases. But if multiple different molecular species are to be tracked, the space of possibilities explodes and solving for the probability distribution becomes difficult. Instead, it is often easier to simulate a large number of realizations. This can be done using a scheme commonly known as Gillespie's algorithm. The basic idea of this algorithm is that the time intervals between discrete events are distributed exponentially (like radioactive decay) with a rate that is the sum of rates of all individual reactions. The rationale is as follows: in a small time slice \(\Delta t\) each and every reaction will happen with probability \(\alpha_i \Delta t \) where \(\alpha_i\) is the rate of reaction \(i\). The probability that any reaction happens within that interval is therefore \(\Gamma\Delta t = \Delta t\sum_i \alpha_i\) and the time interval before something happens is distributed as \(p(\tau) =\Gamma e^{-\Gamma \tau}\). Once an event happens, it could be any of the many possible reactions, each with a probability proportional to its individual rate. The Gillespie algorithm therefore consists of iterating the following steps
- sample a waiting time \(\tau\) with mean \(1/\Gamma\)
- pick a particular reaction with probability \(\alpha_i/\Gamma\)
- update the \({n_i}\), \(t\to t+\tau\) and the rates
Lac Operon
The best studied bacterial regulation is probably the lac operon that regulates expression of enzymes needed to digest lactose. Study into the Lac Operon is traced back to Francois Jacob and Jacques Monod. The concepts discovered by this inquiry are foundational to our understanding of genetic regulation. During WWII, Monod tested the effects of using a combination of sugars on E. coli growth. Generically find two phases of growth/metabolisms. For example, when grown with Glucose + Lactose, then glucose was first metabolized. Once exhausted, then lactose was utilized after a lag phase.
There are many excellent descriptions of the lac operon out there.