Wednesday, January 04, 2006

It must be true, I read it on the Internet!

Wikipedia vs. Britannica
Nature recently published a special report comparing the accuracy of Wikipedia, the on-line encyclopedia, with the Encyclopedia Britannica. Surprisingly, the Nature researchers found that accuracy of science entries in Wikipedia is close to that of Britannica. To me, the most interesting part of this finding, wasn't the result about Wikipedia, it was the result about the Encyclopedia Britannica. According to Nature, researchers found that an average science entry in Wikipedia contained four inaccuracies, while the average science entry in Britannica contained three. It's been many years since I looked at an encyclopedia but an average of three errors per article is still higher than I would have predicted.

The telephone game
I think the reason the error rate, in information sources like encyclopedias, is so high derives from the same phenomenon we see when children play the telephone game.

For those of you who never played, the telephone game begins with one person whispering information to second. The second person whispers the information to a third, and so on. The game ends with the last person sharing the (very different) information with the group.

I used to have my students play the telephone game in our Biotechnology and Society class as an experiment to help students understand why newspaper reports on scientific discoveries can be quite a bit different from the original research finding. I would go out in hallway with a student volunteer and relay a couple of sentences that contained the words "AIDS" and "mosquito."

Usually, I said something like this:
The realization in the 1980's that HIV could be transmitted through blood and cause AIDS caused quite a scare. One of the first questions that researchers had to answer was whether or not HIV was transmitted by mosquitoes.

Then I would return to the classroom and send another student out into the hall to retrieve the information from the first. I didn't do a rigorous study but in general, it only took one or two students before we'd all get reports that students had heard that "mosquitoes cause AIDS."

As the informaticists say, spoken information is "lossy." Every time information travels from one person to another, something gets lost.

This lossiness of information even presents problems in places where it really shouldn't. Genetics Home Reference at the National Library of Medicine, is a case in point. GHR provides summaries of information for different genetic diseases found in humans. It's kind of a newer version of the Genes and Disease section at the NCBI. At first I thought this site was really great.

A is for Alanine
Then I found a mistake, quoted below from GHR, in the entry for the SOD1.
The most common change, which occurs in 50 percent of Americans with type 1 amyotrophic lateral sclerosis, replaces the amino acid arginine with the amino acid valine at position 4 in the enzyme. (This mutation is written as Arg4Val).

Well, I've been working a bit with SOD1, both because I know someone whose brother died from ALS, and also because I developed a tutorial for researching genetic ailments with ALS as a model system. So, I was pretty certain that this statement was wrong and that fourth amino acid was not arginine.

To check the statement, I got the reference sequence for the
human superoxide dismutase I protein from GenBank. You can see it below:
>gi|4507149|ref|NP_000445.1| superoxide dismutase 1, soluble [Homo sapiens]
MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEG
LHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHV
GDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKA
DDLGKGGNEESTKTGNAGSRLACGVIGIAQ
Each letter in this sequence represents a single amino acid. The quote from GHR said that the normal amino acid at position four is arginine (abbreviated R). But the GenBank Reference sequence shows the fourth amino acid to be lysine (abbreviated K).

Well, sometimes the first methionine is processed and cut off of the final protein sequence. If that were true, then the A (alanine) would be the fourth amino acid.

Since the GHR reference stated that a change from Arginine (not Alanine) to Valine was the most common mutation in SOD1, I decided to cross check the information in OMIM, too.

To quote the OMIM record for a mutation at position 4:
Deng et al. (1993) found that the ala4-to-val mutation in exon 1 of the SOD1 gene is the most frequent basis for familial amyotrophic lateral sclerosis (105400)
My guess is that someone on the NLM staff thought (by mistake) that A stands for Arginine instead of Alanine.

Maybe the National Library of Medicine should start a Wiki
I suppose mixing Arginine and Alanine up is a simple mistake but it bothers me for two reasons. First, this is a site that claims a high standard of accuracy, and has a large group of outside reviewers, so even small nitpicky details like amino acid names should be correct. Second, I tried to help out by using the "customer service" web form to let the Genetics Home Reference people know about the mistake. It's been at least a month and I haven't seen the mistake corrected or even received an automated response saying that they got the information.

Alright, we all make mistakes from time to time (yes, me too!) and Wikipedia isn't perfect. The advantage of Wikipedia and other wiki sites, though, is that they can draw on larger numbers of people to help review and correct misinformation. Rather than ask for detailed reviews from a small number of busy people, who might easily miss this sort of detail, a genetic information wiki could, in theory, benefit a larger number of researchers and students if groups like the NLM would allow them to help out.

If the GHR site were a Wiki or had some wiki or even blog capabilities, I would have been able to post a correction, an automated e-mail could have been sent to someone in charge, the posting could be tracked, and someone might have looked at it and checked my contribution. As it is, the information might never be corrected.

I don't want to sound completely negative, because I do like the GHR site. The information is organized well and new conditions are added to the stite on a regular basis. Plus there are links to lots of good sources for additional information. It's just that, now, every time I recommend the site, I have to add a qualifier that students should crosscheck the information with OMIM just to be sure it's correct.

Or maybe I'll just advise them to use GHR for the link list and read the review at GeneTests instead.

Articles referenced:
1. Jim Giles. Internet encyclopaedias go head to head. Nature 438, 900-901 (15 December 2005) | doi:10.1038/438900a

1/17/2006 PostScript: I guess it's not just GHR that's a little off sometimes. I just realized that size of any reference mRNA sequence, in GenBank, is given in bp (base pairs). Sigh.


Subject:

technorati tags: , ,

0 Comments:

Post a Comment

<< Home