Dear Biostar people,
In the scope of a Bioinformatics course, I show some features of the major biological databases (GenBank, UniProt, the PDB...). I also advise my students to be cautious with the data they can find in these databases. To illustrate this, I found quite unusual entries in GenBank:
FEATURES Location/Qualifiers source 1..124 /organism="Nicotiana tabacum" /organelle="plastid:chloroplast" /mol_type="genomic DNA" /isolate="Cuban cahibo cigar, gift from President Fidel Castro" /db_xref="taxon:4097
FEATURES Location/Qualifiers source 1..17084 /organism="Didelphis virginiana" /organelle="mitochondrion" /mol_type="genomic DNA" /isolate="fresh road killed individual" /db_xref="taxon:9267" /tissue_type="liver" /dev_stage="adult"
The previous examples are clearly not wrong (since other features seem correct) but are quite unusual in a scientific context. However, the following entry taken from the PDB 7GPB, chain D, residue 67 is clearly wrong with a Tryptophan completely kinked:
So my question is the following, do you know other examples in GenBank, UniProt, PDB... of such unusual or wrong entries?
Thanks.
Fun question. I'm sure all databases are riddled with questionable entries. My question would be: how can we best identify them using automated methods?
For structures in the PDB, there is the [?]PDBREPORT database[?] that does the job but still requires a human expertise to decide whether or not a structure is "OK".
For structures in the PDB, there is the PDBREPORT database swift.cmbi.ru.nl/gv/pdbreport/ that does the job but still requires a human expertise to decide whether or not a structure is "OK"
Just occurred to me that this question should probably be set to community wiki.
Dear BioStar community,
Thank you all for your interesting answers and comments.
You all officially: Made my day. :D