Tuesday, November 08, 2005

Summer courses in digital biology

For the past three years, Dr. Linnea Fletcher and I have been teaching summer courses through the Chautauqua program funded by the National Science Foundation. These courses are organized on a national basis by the University of Pittsburgh and the Council of Chautauqua Field Centers, and supported by the National Science Foundation Division of Undergraduate Education.

Some of the courses offer the chance to learn about fascinating topics in exotic places, like one course on Galileo that was offered in Italy. But if you can't go to Italy, you might like to come to Texas, where you can learn about digital biology and hear good music in the city limits of Austin. Since our courses are in June and it's too hot to go outside, it's a great time of year to work indoors during the day in an air-conditioned computer lab, and venture out to hear music at night.

It's a bit early, but I'm posting the course descriptions, now, to make more people aware of the courses and gather topic requests. We make changes every year to incorporate new resources and tackle cool papers that we've read. So if you have a topic request, or you attended past courses and want to post rave reviews, please add your note in the comments at the end of the post.

Registration information will be available soon at: http://www.chautauqua.pitt.edu

A Hands-On Tour Through the World of Bioinformatics
LINNEA FLETCHER, Austin Community College and SANDRA G. PORTER, Geospiza, Inc.
Date: June 8th-10th in Austin, TX

High-throughput data collection, web-based bioinformatics tools, and molecular databases have changed the nature of biological research. This course places a strong emphasis on hands-on practice with bioinformatics resources to explore current topics in biological research. Activities and topics in this course are updated yearly in order to incorporate new tools, developments and ideas in fields of genomics, proteomics, and structural informatics. Example topics are: Genotyping, DNA sequence analysis, sequence assembly and alignments, identifying SNPs and other types of sequence variation, genotyping, designing PCR assays, BLAST, making the most of a database search, molecular modeling tools (Cn3D), genetic databases, OMIM, and interpreting experimental results. Lastly, participants discuss how bioinformatics can be applied in their courses.

For college teachers of: bioscience-based courses such as microbiology, genetics, biology, pharmacology, allied health, biotechnology and molecular biology. Prerequisites: none.

Studying Evolution with Bioinformatics
LINNEA FLETCHER, Austin Community College SANDRA G. PORTER, Geospiza, Inc.
Date: June 12-14th in Austin, TX

Students in this course will learn how bioinformatics resources can be applied to the study of evolution on a molecular level. This course includes a significant hands-on component, with new topics introduced every year. Example topics include: genome browsers and tools for comparative genomics, evolution in HIV, evidence for a common ancestor, and looking at genetic codes. Participants learn how to use the UCSC genome browser, prepare a data set, generate multiple sequence alignments, and prepare phylogenetic trees, and use free tools for viewing three-dimensional structures from related proteins. Discussion topics include: choosing sequences for phylogenetic studies, along with different methods for creating phylogenetic trees (neighbor joining, parsimony, maximum likelihood). Topics such as orthology, paralogy, homology, homoplasy, and comparative genomics will also be covered. Case studies where phylogenetic trees have been tested experimentally will also be discussed. Lastly, participants discuss and explore how bioinformatics resources can be used in their courses.

For college teachers of: bioscience-based courses including biology, microbiology, organismal biology, molecular biology, genetics, evolutionary biology, and biotechnology.
Prerequisites: the introductory course, June 8-10th, is recommended.


Monday, November 07, 2005

Head, Shoulders, Knees, and Toes

Why is an eye, an eye and a nose, a nose? Why do different cells create different kinds of tissues when all the cells in a single organism start out with the same set of instructions (aka DNA)?

Head, Shoulders, Knees, and Toes is a learning activity that helps students discover, for themselves, that certain genes are expressed in some tissues but not in others. My goal here, as part of our NSF-funded project, is to show how students can learn biology by doing science with bioinformatics tools.

If you already know all about ESTs, you might want to jump ahead and read about the activity. If you don't know what ESTs are, you might want to read a bit of background information first.

A bit of background information
So, why is an eye an eye and a nose a nose? The answer comes from the sets of instructions that are read. Imagine an instruction book for building a miniature city. Now we give 30 identical copies of that book to a class of 30 students and we tell the students to randomly flip through the book and start wherever they like. Some students might end up building train tracks, some students, parks; others, a library. This isn't a perfect analogy, but you can see that the structures that get built are determined by the instructions that are used.

Cells develop their own identities through a similar mechanism. Some cells use the instruction kit for becoming a heart, some read the kidney instructions, and so on. Our instruction book, however, is written in a human-readable language. The language read by cells is written in a chemical code, DNA, with four different "letters" A's, G's, T's, and C's. Our cells read the instructions through a process called "gene expression" where the code is copied first into RNA, and sometimes translated into protein. The end result, in either case, that different kinds of cells are produced as a result of reading different sets of instructions.

Okay, so different kinds of cells develop because of different instructions. What can we do with this information? What can it tell us?

Well, one of the things we do, as molecular biologists, is to try and identify the sets of instructions that are used by different kinds of cells at different times during development, or in response to different signals. We begin by purifying RNA from cells. Next we make a DNA copy of that RNA (this is done for technical reasons because DNA is a more stable molecule), then we determine the nucleotide sequence. At the end of this process, we have a set of DNA sequences that correspond to RNA sequences, from particular types of tissues or cells. We call these sequences "ESTs," which is short for expressed sequence tags. Remember, one of the first steps in reading the instructions, or expressing a gene, is to make a RNA copy? Well, if we find a piece of RNA, it indicates that a gene that was expressed.

Once we have a set of ESTs, we characterize that set to learn more about the cells that supplied that RNA. We can look at the relative abundance of different ESTs to see which instructions are read more often. We can try to identify ESTs by comparing their nucleotide sequences to a database of sequences. We can compare the ESTs produced by different cells to see if some are only found in the heart, or liver, or stomach. And we can look at when ESTs are produced to see if they might play a role in development. Some ESTs are only produced in fetal tissue some are only produced in adults. This can help us understand how our bodies change during our lifetimes. We talk about the production of these bits of RNA as "tissue specific expression" and "developmental specific" expression to indicate that some RNAs are only made in certain tissues and some RNAs are only made at certain times. There are other RNAs that are produced in response to certain signals, like stress or tissue damage, but we're going to pass on that topic for now.

Time to describe the experiment
Okay, but what do ESTs have to do with the Head, Shoulders, Knees, and Toes activity?

In this activity, students are assigned one of 30 unknown EST sequences and asked to find out what the sequence codes for and where and when it's expressed, in addition to a few other facts about the sequence. (You can go straight to the data set or download the pdf from Geospiza's teaching materials)

All the ESTs in this data set correspond to messenger RNAs, so they do all code for proteins. (That sounds obvious but many ESTs in real life are contaminants). They come from many different tissues and from many different creatures including honeybees, pine trees, carrots, humans, lobsters, cats, mice, gerbils, fish, frogs, chickens, dogs, and other living things.

The important feature for this experiment, is that all of these mRNAs show tissue-specific expression, that is, they code for proteins that are only made in certain tissues. Further, some of the sequences are also expressed in a developmentally specific manner.

One of the sequences, for example, codes for a protein that's only made in germinating seeds. Another sequence codes for human tyrosinase, which might only be expressed in embryos and adults.

As mentioned above, students are charged with the mission of identifying their sequence and using evidence and statistical measures to support their identification. All the sequences are at the Geospiza Education web site and there is an animated tutorial that shows them how to use blastn, a commonly used program for comparing nucleotide sequences.

Once students have identified their sequence, they need to find out where the mRNA came from and where and when it's expressed. I found, in giving professional development workshops, that many teachers have never heard the phrase "gene expression," even though it's commonly used in biological research. So, doing this activity helps teach the language of biology by having students explore gene expression in a new way.

Students also get to see something about the magnitude of gene expression.

So how does this work? UniGene to the rescue!
If the NCBI web server is used for a BLAST search, the results include links to other NCBI databases if a sequence is referenced in multiple places.

Both the sequences in these blast results match my query sequence equally well and code for the same thing. One of these is also a reference sequence (NM tells this, but that's another story).

Both the U and E, in the brightly colored boxes, are linked to databases with expression information. U stands for the UniGene database, a set of EST sequences from different tissues. E probably stands for Expression since the E is linked to the Gene Expression Omnibus, which contains lots of data from microarray experiments. For now, I will stick to UniGene and write about GEO in a later article.

If I click the U, in the blast results, and follow the links, I get the UniGene reference for my sequence. If I scroll down a bit on the UniGene page, I see the heading, "Gene Expression."

Clicking the blue "Expression Profile" link takes me to a nice summary table of EST data that looks something like the dot blots we used to do when I was in graduate school. This table loads slowly, so be patient.

The first column has the tissue type, the second, the number of matching transcripts (RNA molecules) normalized to 1 million, then there are digital ovals that look something like a signal from a radioactive probe, and last, the fraction of matching transcripts divided by the sample size.

This table makes it pretty clear where the tyrosinase gene is expressed.

Below the expression table is another digital dot blot organized instead by developmental stages. It looks like this gene is expressed only in embryos and adults, but if we carefully look at the number of ESTs that were sampled in juveniles; it appears that the lack of expression could be a result of the sample size. The EST pools from embryo tissue and adult tissue are at least ten times larger than the pool of ESTs from juveniles, so it looks like we need a much larger sample size from juvenile tissue to get a conclusive result.

How do students learn about biology by doing this activity?
This activity allows students to see for themselves that some genes are expressed in specific tissues and at specific times during development.

Through this activity, students can discover that genes are regulated before they even know how this works. And making discoveries, after all, is the prize of doing science.


technorati tags: , , , ,