Friday, September 16, 2005

What if Garrison Keillor did bioinformatics?

Okay - I wrote this a few years ago and some of the issues have sorted themselves out, but not so many as one might think.

What is bioinformatics? A biologist's perspective.

Imagine this. You've been sequencing DNA for a few years now, perhaps ESTs, or something else, and storing files on your local network. Your system administrator makes backup files for you and all is well.

One day you learn about interesting results from assembling sequence data and decide to try it yourself.

Watch out! You are about to descend into bioinformatics hell.

Soon you learn that the assembly program has complicated requirements and demands that all files entering the system be given an incomprehensible name to comply with sequencing procedures from the last decade.

You beg someone to do something with the computer and rename your files. Meanwhile, the back-up files with the original names, that were referenced in experimental procedure and linked to experimental data, languish on the system, forgotten. A few months later, no one knows why those files are there. Your new files with their new names are backed up. More new files enter the system and quickly acquire two sets of names. More months pass, the server is loaded down with files, and no one knows why.

Your department head, frustrated with the slow network, hires an expert to analyze the system and determine if you need a Linux cluster. Oops, it turns out that many files contain the same information. Naturally, the older files are deleted. Now all information connecting the files to the original experiments is lost.

Your lab director says to quit fooling around and hires someone to move all of your data into a database. But, the next few weeks find you ranting at your computer. Why? You don't know how to use SQL and you have important research to do, dammit! The last thing you want to do is fight with your computer to get it to tell you something you don't already know. And, you start to wonder, what exactly is in those tables? And why tables? And how are you going to get your data back and do something useful with it?

Perhaps, you decide, it's time to hire a programmer.

The first person you interview is very enthusiastic. You ask about programming experience. Apparently, he can program in more languages than a UN interpreter can speak. And he's especially excited about some language called "open source" and some snake language. Confused already, you ask what he's done. It turns out that he's written games and designed something sticky or gooey (you think) and know lots about cold fusion. You're a little worried about using gooey stuff around your computer and puzzled by the remark about cold fusion (especially since it was a fraud), but you smile and nod, not wanting to betray your ignorance.

Time to switch to your domain.

Do you know anything about biology? you ask. The candidate smiles. Oh yes! He took biology in high school and read "Genome", too!

You hire him, pay him twice the salary of any of the post-docs, and have him start with something simple. You ask him to write a program to translate DNA into open reading frames. You're met with a blank stare. Is there a problem, you ask? What's an open reading frame? is the reply.

To quote Garrison Keillor, "Wouldn't this be a great time for a slice of rhubarb pie?"


technorati tags:

Friday, September 09, 2005

Molecular resources for monarch biology

One of the best places to get both sequence information and information on current research is the National Center for Biotechnology Information. The NCBI is part of the National Library of Medicine in the National Institutes of Health. One of the best known aspects of the NCBI is that they house GenBank, a collection of all the DNA and protein sequences that are publicly available. They also have PubMed, a database of scientific literature that's related to medical research.

To get sequence information for Monarch butterflies:

1. Go to the Taxonomy Browser at the NCBI

2. Search with the scientific name for monarchs: Danaus plexippus

This takes you to a page with the heading "Danaus plexiplus" and two subspecies. Click the Danaus plexippus link at the top of the list to get the taxonomy record.

There is a handy box in the upper right hand corner of the taxonomy record with useful links.

The links that are shown depend on the types of resources that are present in the NCBI databases. For monarch butterflies these links to records in the Nucleotide, Protein, Popset, PubMed Central, and Taxonomy databases. The numbers in the columns reflect the number of records. So, the nucleotide database has 52 sequences from monarch butterflies (as of this morning). The protein database has 72 total, for both of the subspecies.

You can also get the sequences straight from GenBank.

The other very handy links are Popset and PubMed Central. The PopSet reference for Danaus, is linked to a list of 12 different sets of sequences from population and evolutionary studies of butterflies. PopSet is a database with sets of sequences from evolutionary studies. If you look at the papers that are referenced in PopSet, they should include in the sequences for the primers that were used in the studies. The PopSet sequences are also great for making phylogenetic trees, but that subject will be a future post.

The PubMed Central link gives a list of several scientific literature citations for monarch butterflies. The cool thing about PubMed Central is that you can actually get the full text and read the entire paper, if you're so inclined. Scientific publishers are still experimenting with free, on-line access, so these may only be available for a limited time.

Subject: ,

technorati tags: , ,

Butterflies, birds, and worms

One of most wonderful things about the Internet has been the emergence of research projects that involve the general public. Universities like Cornell, Kansas University, and the University of Minnesota, to name a few, have established web sites and on-line databases that encourage both students and amateur biologists to participate in biological field studies. Not only do these projects extend the potential for good science by collecting more data, they give visibility to the research process and allow the public to take ownership and contribute to the store of scientific knowledge.

Monarch Watch
When I was a child, everyone had a butterfly collection and monarchs were everywhere. Now, monarch populations are declining and they're habitat is rapidly being lost. If we want monarchs fluttering by in more than our memories, they will need our help.
At Monarch watch, students can learn and participate in studies of monarch migration. Resources are also available for setting up waystations and helping in monarch conservation. Some of the research projects that you can be involved in include tagging monarchs, monitoring larval, and measuring size and mass. Monarch watch also has a database on tag recovery that you can search in order to find out how many tags have been recovered when and where.

Worm Watch
Worm watch involves students in collecting, counting, and identifying worms. According to the web site, non-native species of worms, such as earthworms cause damage to forest ecosystems. Scientists at the University of Minnesota are enlisting the help of classrooms in doing surveys to count the number of earthworms at different locations. This information helps the scientists to understand the extent of earthworm spread and determine how badly the ecosystem has been damaged.

eBird is a joint project, powered by the Cornell Lab of Ornithology at Cornell University and the Audubon Society. Through eBird, anyone can enter and store bird observations, and learn about birds that others have seen. eBird has advice for identifying birds, instructions for observing birds and maps that show you where birds have been sited.

Subject: , ,

technorati tags: , ,