Discovering biology in a digital world (Archives): Part I: Future Shock and Selenocysteine

Future Shock

When I was in high school, we read an intriguing book by Alvin Toffler called "Future Shock."

Now, the book is over 30 years old but some of the predictions Toffler made were uncanny.

One of the ideas Toffler proposed was that people could become overwhelmed and disoriented with the onslaught of new information. My field is a good example. For Geospiza, helping people manage large amounts of new data, while maintaining the old, is our whole raison d’être.

But going back to Toffler, he predicted that the increasing rate of societal change would cause some people to experience symptoms of "Future Shock." One morning you might wake up in a familiar place, but everything would seem a bit different and strange. I'm channeling the ghost of Jim Morrison a bit, but the The Doors had the feeling nailed down.

It's never bothered me though, until the other day.

I learned something new that shook one of my core beliefs.

We have a new amino acid in the genetic code.

Sure, go ahead and laugh.

This might seem like an odd thing to be bothered by, but the genetic code was solved in the early 60's. Some things in life are NOT supposed to change. Yeah, there are some variations in translating DNA from different species, and we expect to learn new things from deciphering the genome, but no one expects changes in something as fundamental as the genetic code.

So, it was a bit jarring to find out that now there are 21 amino acids.

And it was even a little reasurring that no one believed me.

My husband kept insisting that this was a post-translational modification or some strange anomaly from archeabacteria.

Naturally, I was forced to hunt down a bunch of abstracts and read them to everyone (I love PubMed!).

Selenocysteine: our 21st amino acid

It's true. The new amino acid is selenocysteine and there are even special tRNAs that can add it during translation. The translation machinary recognizes the UGA stop codon, plus special secondary structures in mRNA, and puts in a selenocysteine instead of stopping.

This amino acid is uncommon, but GenBank has 7904 entries for selenoproteins and 3293 RefSeqs. Many are probably orthologs (the same protein in different organisms) or our favorites, those wonderful "hypothetical proteins," and I think some of the records represent the same sequence, but there's still a fair number to be found (except in Pfam, but more on that in part II).

Selenoproteins are pretty wide-spread, too. At least 25 selenoproteins are known in humans and I found papers describing them in mouse, fruit flies, humans, fish, bacteria, and protozoans. Most selenoproteins only contain one selenium and it's positioned at the active site. One selenoprotein contains so many seleniums that this one protein, alone, accounts for half of the selenium in a cell.

I'm not too sure yet, about the function of these proteins. Some of the selenoproteins may be important in redox reactions, one might prevent heavy metal toxicity, and there seems to be some link to cancer, too.

And, guess what?

I'm wasn't the only one who was taken by surprise. It looks like some of our favorite bioinformaticists and genome annotators missed this one, too.

Stay tuned.

In part II, we look at the infinite loop of information updates and an interesting conclusion drawn from erroneous annotations.

Subject: Doing biology with bioinformatics

technorati tags: biology, bioinformatics, blast, genetics, genomics, DNA, RNA, Science Education

2 Comments:

Anonymous said...: > We have a new amino acid in the genetic code.

Not to be anal or pedantic but amino acids are not part of the genetic code. They are part of the protein code. The genetic code is nucleic acids and nominally is composed of 4 characters. But the RNA counterpart to the genetic code contains many nucleic acids -- mostly modifications.; 4:58 AM
Sandra Porter said...: Point taken.

I used the phrase "Genetic code" in a looser sense, encompassing both the code and the info that's encoded.

And certainly, the RNA part of "the code," with all the modfied nucleotides, twists and secondary structure, is much more complex than a mere 64 codons.; 9:21 AM