Designs in DNA

Hi, I am Rich Deem and this video is going
to examine three different examples of designs in DNA.
First, some review about DNA. If you think you know enough about DNA, please skip ahead
to the 5 minute mark. According to Francis Crick, the central dogma or principle of biology
is that DNA in the nucleus is transcribed into RNA, which exists the nucleus and is
translated into protein. In order to understand designs in DNA, we are going to need to know
how DNA is put together. DNA consists of only four bases: adenine (A),
thymine (T), guanine (G), and cytosine (C). Abbreviated A, T, G, C.
The bases are chemically bound to a deoxyribose sugar, which is called a nucleoside. When
the sugar is phosphorylated, it is called a nucleotide.
Nucleotides are bound to each other through the phosphate bonds of the sugars: Here we
have adenine, cytosine, guanine, and thymine. As you all know, DNA is a double helix, which
means that there is a second strand bound to the first. So, adenine binds to its complementary
base, thymine. Cytosine binds to guanine. Guanine binds to cytosine. And thymine binds
to adenine. So, on the left strand sugars bond to each other 5′ to 3′ from top to
bottom. On the complementary strand, the sugars are bound 5′ to 3′ in the opposite direction.
So, the strands are said to be anti-parallel. This shows a stylized DNA double helix showing
the base pair bonding. As you can see, the base pairs are on the inside of the helix,
something scientists did not expect at first. Since the base pairs specified the code, scientists
racing to discover the structure of DNA assumed the base pairs would be on the outside of
any structure. Watson and Crick put together all the pieces of the puzzle to figure out
how DNA was put together.? The DNA is housed in the nucleus. The dark
areas are composed of condensed DNA, which is not used by the cell. The light areas are
the euchromatin, which is the DNA that is actively transcribed by the cell. Every cell
has a full complement of all 3.2 billion DNA base pairs, but only uses a fraction of that
DNA in its day to day function. However, different kinds of cells use different parts of DNA
molecules, so different parts of the DNA make up the heterochromatin vs. euchromatin in
different cell types. If you recall the central dogma of biology,
DNA is transcribed into RNA. The bases Adenine, cytosine, and guanine are the same in both
DNA and RNA However, uracil is substituted for thymine in the RNA. The two molecules
are quite similar, other than the methyl group in thymine. And, of course, DNA is bound to
a deoxyribose sugar whereas RNA is bound to ribose.
The RNA is translated into protein through a complex array of molecular machines. This
is our messenger RNA (or mRNA) molecule. Three nucleotide bases make up what is, called a
codon, which specifies one amino acid, which is the building block of proteins. The transfer
RNA or tRNA molecule provides the specificity. On one end is the anti-codon, which is complementary
to the codon, allowing it to bind to the mRNA. The other end of the molecule in bound to
a specific amino acid, in this case methionine. Biochemical machines move along the mRNA to
act like a factory to assemble the final protein product.
This is the genetic code, which determines how the codons are translated into amino acids.
All living forms (with a few exceptions) use the exact same the genetic code. There are
a few several special codons in this list, highlighted in yellow. AUG codes for methionine,
but more importantly always designates the start of a protein sequence. Three codons
do not code for any amino acid at all, but cause the termination of the translation process.
These are called stop codons. The genetic code is said to be redundant because multiple
codons can code for one amino acid. The single-celled protozoan Trichomonas vaginalis
is about the size of a blood cell, but has about 60,000 genes in its genome. Human beings
have only ~22,000 genes. How can a much more complicated organism have only a third the
number of genes as a single-celled protozoan? This is a schematic of a typical gene. A gene
consists of exons, which are the coding regions and introns, which control how the gene is
transcribed. On the ends of the gene are untranslated regions, which are transcribed, but not translated.
The gene is transcribed into pre-mRNA. The intronic sequences are spliced out, forming
the final message. The exonic regions of the message are translated into protein. So, why
would God create a system in which there are introns in between the code that is translated,
even though those sequences are eventually removed?
It turns out that this “simple” means of producing proteins does not follow for
all genes. In nearly all genes that contain multiple exons, the pre-mRNA is spliced in
alternative ways. Sometimes, all the exons are included in the final mRNA transcript.
At other times, only a select set of exons are included in the mRNA and the other exons
are spliced out. In this way, multiple proteins can be produced from the same gene.
The alternative splicing of RNA can occur in any of a number of ways, producing many
different varieties of proteins. This is a list of the many ways in which multiple exonic
regions can be alternatively spliced. Since there are an average of 5-6 splice variants
per gene, the human genome can produce about 100,000 different proteins. Design or evolution?
Our next example of design in DNA is based upon research that was first published in
December, 2013. It turns out the transcription process is
a little more complicate than I first indicated. Genes consist not only of exons and introns,
but also a promoter region, which controls the expression of the gene. The promoter region,
shown here expanded, usually contains several sequences that are bound to what are called
transcription binding factors. Transcription factors are proteins that have affinity for
certain specific DNA sequences. When a transcription factor binds to a sequence in the promoter
region of a gene, they either turn off or on the transcription of that gene. Scientists
have known about how transcription factors control gene expression for many years, although
they assumed those sites would be all located in non-coding DNA.
Scientists wanted to map all the transcription factor binding sites throughout the entire
human genome. So, they used a kind of trick to find all those sites. There is an enzyme
called DNase I that digests DNA into its component bases. It turns out that if a transcription
factor protein is bound to a section of DNA, DNase I will not digest that DNA. Scientists
examined the DNA from 81 different cells lines for their study, since different cell types
have different genes that are turned on or turned off, which would be bound to transcription
factors. The remaining undigested fragments of DNA were sequenced and mapped. This is
an example from one gene, showing that the DNA was protected from digestion when bound
to various transcription factors. What surprised scientists was that many of
these transcription factor binding sites were not in the promoter regions, but were actually
within the exons themselves. So, these regions of DNA coded for both protein sequence and
transcription factor binding simultaneously. These dual coding sequences have been called
“duons.” It turns out that 86% of all genes expressed at least one duon sequence.
Duons comprise 14% of all exonic coding sequence for a total of over 12 million base pairs.
Here is an example of a duon in the gene CELSR2. This is part of the DNA sequence and the corresponding
amino acid sequence in the protein, showing the codons. And here is the transcription
factor binding sequence, which almost exactly matches the DNA sequence. In this sequence,
there are two arginine amino acids, which use completely different codon sequences,
in order to match the needed transcription factor binding sequence. This is a likely
reason why the genetic code is redundant. How do we know these duon sequences are really
functional? This is a graph of the number of duon sequences as a function of location.
If these sequences were functional, we would expect them to be found mostly in the first
exon, which they are. Why weren’t these duon sequences discovered before 2013? It
is because evolution would never predict that dual coding regions should exist.
Now, we are going to look at our third example of designs in DNA — dual coding genes.
We know that three DNA bases make up a codon and that codons follow sequentially. However,
in theory DNA has six reading frames — three on each strand. Here is shown one reading
frame and its corresponding amino acid sequence. If we shift the reading frame to the right
by one base, we get a completely different amino acid sequence. Shift it again, and there
is yet another different amino acid sequence. Now, as we know, the opposite strand of DNA
has a complementary sequence, which is shown here. Whereas one strand is read in one direction,
the complementary strand is read in the opposite direction, producing a fourth different amino
acid sequence. And, this reading frame can also be shifted by one to produce a 5th amino
acid sequence. Shift again and we get a 6th. Some of these reading frames lead to interesting
results. The 5th one introduces a methionine start sequence, which could result in producing
an entirely new protein. And the sixth one introduces a stop codon, which would terminate
the protein prematurely. So, almost universally, when we have a mutation that shifts the reading
frame, it is almost always bad, since it destroys the current protein sequence.
This is a quote from a study that examined dual coding genes. “Coding of multiple proteins
by overlapping reading frames is not a feature one would associate with eukaryotic genes.
Indeed, codependency between codons of overlapping protein-coding regions imposes a unique set
of evolutionary constraints, making it a costly arrangement. Yet in cases of tightly co-expressed
interacting proteins, dual coding may be advantageous. Here we show that although dual coding is
nearly impossible by chance, a number of human transcripts contain overlapping coding regions.”
? The evolutionary assumption is that true dual coding genes are conserved among related
species, which underestimates true numbers of genes. A study by Sanna et al found that
9% of human genes and 7% of mouse genes were dual coding. By chance, one would expect only
0.07% of genes to be dual coding. However, fewer than 30% of dual coding genes are shared
between human and mouse. The study also found that 90% of dual coding genes have overlapping
coding sequences on opposite DNA strands. In bacteria, 84% of overlapping genes are
on the same strand. A study from last year analyzed human open reading frame proteins
by mass spectrometry and detected a total of 1259, which is much higher than expected.
Here is an analogy, which consists of three sentences that can be read either forward
or backward. Needless to say, these are not random sentences, but were designed by intelligent
agents. So, the magnitude of the dual coding problem would be the equivalent of writing
a novel that could be read either forward or reverse directions, making two different
stories that both of which made sense. Here is an example of dual coding gene EIF6.
Here is reading frame 1 showing the exons. And here is reading frame 2. Both proteins
start at the same codon. However, the second sequence is twice as long as the first. The
second reading frame reads through the intron, producing a frameshift in the overlapping
exons, shown here. This is the DNA sequence of the overlapping region. The first reading
frame produces this amino acid sequence and the second, this amino acid sequence. As you
can see, the two sequences are quite different, although both produce working proteins.
Here is another dual coding gene; Ncaph2. This gene produces three transcripts — a
long and a short one, which are alternative transcripts. The third transcript has a different
start codon, which is read using an alternative reading frame. As can be seen, the third transcript
has a completely different amino acid sequence. This is a gel electrophoresis of the three
different protein products. On the left is the protein ladder, which tells us the size
of the proteins. The top band in each lane is the long version. The one right below that
is the intermediate product and the lower one is the short protein. Each lane displays
the amount of protein produced in each organ. So, this data shows that not only are all
three proteins produced by different kinds of cells, but they are produced in different
amounts in different organs. It is a remarkably clever design.
So, we have seen three examples of designs in DNA. Alternative splicing of RNA that produces
multiple proteins from one gene. Duons–overlapping sequences that code for both protein expression
and transcription factor binding sites simultaneously. And dual coding genes, in which one sequence
is read in multiple frames to produce completely different proteins. And, most importantly,
we must all remember that this design just evolved–right?
More information can be found online at our website, Thanks for watching
and be sure to sign-up for updates from our YouTube channel.

Leave a Reply

Your email address will not be published. Required fields are marked *