Genetic Code

The genetic code is the set of rules by which information encoded in genetic material is translated into proteins by living cells.

Deoxyribonucleic acid (DNA) contains the genetic instructions used in the development and functioning of all known living organisms. The main role of DNA molecules is the long-term storage of information.

Units of the genetic code are the nucleotides: adenosine (A), cytosine (C), guanine (G), and thymine (T). A triplet of these units are known as a codon, which ultimately represents an amino acid (see codon to amino acid mapping). With 4 potential values in three positions, there are 4³ permutations or 64 possible codons to represent the 20 amino acids.

Just as the letters of the alphabet can be combined to form an almost endless variety of words, amino acids can be linked together in varying sequences to form a vast variety of proteins. Every protein is chemically defined by its unique sequence of amino acid residues, which in turn define the three-dimensional structure of the protein.

MIME Base 64

Base64 is an encoding scheme that encodes binary data by treating it numerically and translating it into a base 64 representation. For instance, take the following encoding of “Man”:

TextMan
ASCII7797110
Bit Pattern010011010110000101101110
Index1922546
Base64-EncodedTWFu

Remap

Rather than mapping to a character, I chose to remap the base64 index to a codon and thus a protein, resulting in the following table:

ValueCharacterCodonAmino Acid
0AAAALysine (K)
1BAACAsparagine (N)
2CAAGLysine (K)
3DAATAsparagine (N)
4EACAThreonine (T)
5FACCThreonine (T)
6GACGThreonine (T)
7HACTThreonine (T)
8IAGAArginine (R)
9JAGCSerine (S)
10KAGGArginine (R)
11LAGTSerine (S)
12MATAIsoleucine (I)
13NATCIsoleucine (I)
14OATGMethionine (M)
15PATTIsoleucine (I)
16QCAAGlutamine (Q)
17RCACHistidine (H)
18SCAGGlutamine (Q)
19TCATHistidine (H)
20UCCAProline (P)
21VCCCProline (P)
22WCCGProline (P)
23XCCTProline (P)
24YCGAArginine (R)
25ZCGCArginine (R)
26aCGGArginine (R)
27bCGTArginine (R)
28cCTALeucine (L)
29dCTCLeucine (L)
30eCTGLeucine (L)
31fCTTLeucine (L)
32gGAAGlutamate (E)
33hGACAspartate (D)
34iGAGGlutamate (E)
35jGATAspartate (D)
36kGCAAlanine (A)
37lGCCAlanine (A)
38mGCGAlanine (A)
39nGCTAlanine (A)
40oGGAGlycine (G)
41pGGCGlycine (G)
42qGGGGlycine (G)
43rGGTGlycine (G)
44sGTAValine (V)
45tGTCValine (V)
46uGTGValine (V)
47vGTTValine (V)
48wTAASTOP
49xTACTyrosine (Y)
50yTAGSTOP
51zTATTyrosine (Y)
520TCASerine (S)
531TCCSerine (S)
542TCGSerine (S)
553TCTSerine (S)
564TGASTOP
575TGCCystine (C)
586TGGTryptophan (W)
597TGTCystine (C)
608TTALeucine (L)
619TTCPhenylalanine (F)
62+TTGLeucine (L)
63/TTTPhenylalanine (F)

This enables you to encode any binary data as DNA, including text (Unicode), bitmaps, audio, video, etc. Example, Hello, world!:

Base64SGVsbG8sIHdvcmxkIQo=
Base64 Index18 6 21 44 27 6 60 44 8 7 29 47 28 38 49 36 8 16 40
DNACAGACGCCCGTACGTACGTTAGTAAGAACTCTCGTTCTAGCGTACGCAAGACAAGGA
ProteinQTPVRTLVRTLVLAYARQG

Try it yourself »

Disk is cheap, who cares?

So why would we want to do this, digital storage is becoming abundant and cheap.

Non-Digital Library

Via recombinant DNA technologies, one could craft a portable, reproducible, time-resistant library of data. Imagine dormant bacteria, serving as an organic time capsule, waiting to grow and multiply the encoded message. Bacillus mobydickus or Spirilla biblicis. Perhaps we package into a pill the Britannica protein.

Spime

“The key to the Spime is identity. A Spime is, by definition, the protagonist of a documented process. It is an historical entity with an accessible, precise trajectory through space and time.” –Bruce Sterling, Shaping Things

A spime is an instance of an object, uniquely identified by an ID code. Attached to this ID are a history of manufacture and ownership, geographical position and trails, customization details, public discourse, etc. A spime embodies and communicates its entire being to the external world.

We could use encoded DNA to embed our own narratives, family histories and trees, etc. into our own being. Noncoding DNA describes sequences that do not encode for protein sequences. Much of this DNA has no known biological function and is sometimes referred to as “junk DNA”.

More than 98% of the human genome is non-coding. A human spime could recycle this junk DNA by recombining encoded messages into non-coding DNA regions via in-vitro manipulation or gene therapy techniques. Perhaps it could be used to be of a different encoding? Maybe it already is?

Evolving oral tradition into hereditary storytelling.

Below is the presentation given in class:

</embed>