# Gene regulation

The expression of genes is tightly regulated in space and time. 
 * Transcription factors: proteins that bind to DNA and enhance or suppress expression
 * Sigma factors (transcription initiation)
 * Chromatin state: the genome can be packaged away and silenced
 * DNA modifications: methylation, histone marks
 * Protein modifications, dimerizations, etc

The molecular biology of these process is very complicated and we will focus here on general aspects of the problem.

## How abundant are transcription factors?

![image.png](attachment:d9b75131-6f3d-4130-8f54-211afc9b9716.png)

Abundance of transcription factors in E.coli. From bionumbers.org with original data from [Li et al., 2014]. The data are shown as a cumulative distribution, that is the y-axis show the fraction of transcription factors that have an abundance below the value on the x-axis. Cumulative distributions might be a little unfamiliar to read, but have many advantages over classical histograms since they donâ€™t require a choice of binning.

 * Activators: rare, often under 10 copies
 * Repressors: more common typically 100 copies

### Uniqueness of binding site

In a random string of ACGT of length one million, every 10-mer of bases will occur once on average. 
10 bases is also the length of one turn of the DNA double helix and a transcription factor can meaningfully interact with about that many base pairs. 
Hence transcription factor binding in bacteria is approximately unique. 
This is reflected in the typical architecture where one, two, or sometimes three transcription factors regulate a gene or operon. 

In eurkaryotes with 1000-fold larger genomes, TF binding is rarely unique. Instead, regulation is combinatorial with many layers contributing. 

In [1]:
# Bacterium
print("\n\nBacterium\n")
L = 5e6
for k in range(2,13):
    print(f"number specific of {k}-mers in a genome of length {L}: {L/4**k:1.2f}")
    
# human
print("\n\nHuman\n")
L = 3e9
for k in range(2,24,2):
    print(f"number specific of {k}-mers in a genome of length {L}: {L/4**k:1.2f}")




Bacterium

number specific of 2-mers in a genome of length 5000000.0: 312500.00
number specific of 3-mers in a genome of length 5000000.0: 78125.00
number specific of 4-mers in a genome of length 5000000.0: 19531.25
number specific of 5-mers in a genome of length 5000000.0: 4882.81
number specific of 6-mers in a genome of length 5000000.0: 1220.70
number specific of 7-mers in a genome of length 5000000.0: 305.18
number specific of 8-mers in a genome of length 5000000.0: 76.29
number specific of 9-mers in a genome of length 5000000.0: 19.07
number specific of 10-mers in a genome of length 5000000.0: 4.77
number specific of 11-mers in a genome of length 5000000.0: 1.19
number specific of 12-mers in a genome of length 5000000.0: 0.30


Human

number specific of 2-mers in a genome of length 3000000000.0: 187500000.00
number specific of 4-mers in a genome of length 3000000000.0: 11718750.00
number specific of 6-mers in a genome of length 3000000000.0: 732421.88
number specific of 8-mers i