Co-Opting the Genetic Code: DNA Encoding Protocol

Genetic Code

The genetic code is the set of rules by which information encoded in genetic material is translated into proteins by living cells.

Deoxyribonucleic acid (DNA) contains the genetic instructions used in the development and functioning of all known living organisms. The main role of DNA molecules is the long-term storage of information.

Units of the genetic code are the nucleotides: adenosine (A), cytosine (C), guanine (G), and thymine (T). A triplet of these units are known as a codon, which ultimately represents an amino acid (see codon to amino acid mapping). With 4 potential values in three positions, there are 4³ permutations or 64 possible codons to represent the 20 amino acids.

Just as the letters of the alphabet can be combined to form an almost endless variety of words, amino acids can be linked together in varying sequences to form a vast variety of proteins. Every protein is chemically defined by its unique sequence of amino acid residues, which in turn define the three-dimensional structure of the protein.

MIME Base 64

Base64 is an encoding scheme that encodes binary data by treating it numerically and translating it into a base 64 representation. For instance, take the following encoding of “Man”:

Text	M								a								n
ASCII	77								97								110
Bit Pattern	0	1	0	0	1	1	0	1	0	1	1	0	0	0	0	1	0	1	1	0	1	1	1	0
Index	19						22						5						46
Base64-Encoded	T						W						F						u

Remap

Rather than mapping to a character, I chose to remap the base64 index to a codon and thus a protein, resulting in the following table:

Value	Character	Codon	Amino Acid
0	A	AAA	Lysine (K)
1	B	AAC	Asparagine (N)
2	C	AAG	Lysine (K)
3	D	AAT	Asparagine (N)
4	E	ACA	Threonine (T)
5	F	ACC	Threonine (T)
6	G	ACG	Threonine (T)
7	H	ACT	Threonine (T)
8	I	AGA	Arginine (R)
9	J	AGC	Serine (S)
10	K	AGG	Arginine (R)
11	L	AGT	Serine (S)
12	M	ATA	Isoleucine (I)
13	N	ATC	Isoleucine (I)
14	O	ATG	Methionine (M)
15	P	ATT	Isoleucine (I)
16	Q	CAA	Glutamine (Q)
17	R	CAC	Histidine (H)
18	S	CAG	Glutamine (Q)
19	T	CAT	Histidine (H)
20	U	CCA	Proline (P)
21	V	CCC	Proline (P)
22	W	CCG	Proline (P)
23	X	CCT	Proline (P)
24	Y	CGA	Arginine (R)
25	Z	CGC	Arginine (R)
26	a	CGG	Arginine (R)
27	b	CGT	Arginine (R)
28	c	CTA	Leucine (L)
29	d	CTC	Leucine (L)
30	e	CTG	Leucine (L)
31	f	CTT	Leucine (L)
32	g	GAA	Glutamate (E)
33	h	GAC	Aspartate (D)
34	i	GAG	Glutamate (E)
35	j	GAT	Aspartate (D)
36	k	GCA	Alanine (A)
37	l	GCC	Alanine (A)
38	m	GCG	Alanine (A)
39	n	GCT	Alanine (A)
40	o	GGA	Glycine (G)
41	p	GGC	Glycine (G)
42	q	GGG	Glycine (G)
43	r	GGT	Glycine (G)
44	s	GTA	Valine (V)
45	t	GTC	Valine (V)
46	u	GTG	Valine (V)
47	v	GTT	Valine (V)
48	w	TAA	STOP
49	x	TAC	Tyrosine (Y)
50	y	TAG	STOP
51	z	TAT	Tyrosine (Y)
52	0	TCA	Serine (S)
53	1	TCC	Serine (S)
54	2	TCG	Serine (S)
55	3	TCT	Serine (S)
56	4	TGA	STOP
57	5	TGC	Cystine (C)
58	6	TGG	Tryptophan (W)
59	7	TGT	Cystine (C)
60	8	TTA	Leucine (L)
61	9	TTC	Phenylalanine (F)
62	+	TTG	Leucine (L)
63	/	TTT	Phenylalanine (F)

This enables you to encode any binary data as DNA, including text (Unicode), bitmaps, audio, video, etc. Example, Hello, world!:

Base64	SGVsbG8sIHdvcmxkIQo=
Base64 Index	18 6 21 44 27 6 60 44 8 7 29 47 28 38 49 36 8 16 40
DNA	CAGACGCCCGTACGTACGTTAGTAAGAACTCTCGTTCTAGCGTACGCAAGACAAGGA
Protein	QTPVRTLVRTLVLAYARQG

Try it yourself »

Disk is cheap, who cares?

So why would we want to do this, digital storage is becoming abundant and cheap.

Non-Digital Library

Via recombinant DNA technologies, one could craft a portable, reproducible, time-resistant library of data. Imagine dormant bacteria, serving as an organic time capsule, waiting to grow and multiply the encoded message. Bacillus mobydickus or Spirilla biblicis. Perhaps we package into a pill the Britannica protein.

Spime

“The key to the Spime is identity. A Spime is, by definition, the protagonist of a documented process. It is an historical entity with an accessible, precise trajectory through space and time.” –Bruce Sterling, Shaping Things

A spime is an instance of an object, uniquely identified by an ID code. Attached to this ID are a history of manufacture and ownership, geographical position and trails, customization details, public discourse, etc. A spime embodies and communicates its entire being to the external world.

We could use encoded DNA to embed our own narratives, family histories and trees, etc. into our own being. Noncoding DNA describes sequences that do not encode for protein sequences. Much of this DNA has no known biological function and is sometimes referred to as “junk DNA”.

More than 98% of the human genome is non-coding. A human spime could recycle this junk DNA by recombining encoded messages into non-coding DNA regions via in-vitro manipulation or gene therapy techniques. Perhaps it could be used to be of a different encoding? Maybe it already is?

Evolving oral tradition into hereditary storytelling.

Below is the presentation given in class: