Interesting research has been published in the online journal PLoS One, describing a problem with contamination in non-human DNA databases. DNA databases are libraries of genetic information about specific species. When a species has its genome sequenced, its genetic data goes into a database so that other research can be conducted based on that known genetic information.
When a DNA database becomes contaminated it means that there is other information that has corrupted the data stored in the database. In the new PLoS One paper the researchers (from the University of Connecticut) evaluated human contamination of databases that were supposed to contain other species - like the zebrafish. So contamination occurs when human DNA gets incorporated into the database for another species. When researchers go to work with the data about the zebrafish for example, they are actually working with human data without knowing it.
The University of Connecticut researchers looked for human contamination in NCBI genome databases, the University of California Santa Cruz (UCSC) databases, and the Joint Genome Institute databases. They found human DNA where it shouldn't have been in a total of 492 of 2,749 evaluated databases.
This contamination issue is extremely problematic because research conducted based on contaminated information can not be trusted to be accurate. It can also be very difficult to track down which databases are contaminated unless the resources (time, money, etc) are spent to evaluate databases for clarity - as was done in this new research.
Database contamination is a relatively new issue brought to light be the massive influx of new genetic information made possible by improved genome sequencing technology. A similar issue that has existed for decades is cell line contamination which occurs when cells that are suspended in culture (alive outside of the body) are contaminated with cells that aren't supposed to be there.
No regulatory body has stepped up and put a stop to cell line contamination in the last thirty years. I just hope that database contamination doesn't follow suit.
To learn more, read the paper about Database contamination, or read an article I wrote for BioTechniques about cell line contamination. As taxpayers we spend a lot of money to fund scientific research, so it is important to know what problems (like contamination) exist in the research community.