Question

Where are the mammoth's ORFs?

1

Entering edit mode

10.6 years ago

cdsouthan ★ 1.9k

Not sure if anyone from the Swedish Museum of Natural History is on this forum but does anyone know any plans to process the bam files from http://www.ebi.ac.uk/ena/data/view/ERP008929 into something we can actually use for looking at protein evolution? Might Ensembl eventualy pick up the data for their pipeline Emily_Ensembl? and/or the NCBI? This is not the first time journal editors allow a new genome paper without the genome in question being in any usable form for biologists

"Complete Genomes Reveal Signatures of Demographic and Genetic Declines in the Woolly Mammoth"

Ensembl Assembly protein-annotation • 2.8k views

ADD COMMENT • link updated 3.4 years ago by Ram 45k • written 10.6 years ago by cdsouthan ★ 1.9k

0

Entering edit mode

Well, the fastq files are right there so there's nothing stopping your from doing a quick assembly and ORF prediction. Alternatively you could convert the bam files to fasta (I assume it's the assembly) and then predict them ORFs..

ADD REPLY • link 10.6 years ago by 5heikki 11k

0

Entering edit mode

Sure, there are many on here who could do this like rolling of a log (but also get high ORF errors) but I'm unfortunately not one of them (can you tell if they are decent assemblies ?). The point is that substancial scientific value of the whole exersise is lost until it does get a full gene build that Ensembl Compara can crunch

ADD REPLY • link updated 3.4 years ago by Ram 45k • written 10.6 years ago by cdsouthan ★ 1.9k

0

Entering edit mode

Well, they probably had their reasons for not submitting the assemblies and predicted proteins as such. Now the assembly is (probably) nicely hidden from the majority of biologists in the bam file. I haven't checked the paper, but if they so much as mentioned a mammoth protein in it, then obviously also their ORF predictions should have been submitted..

ADD REPLY • link 10.6 years ago by 5heikki 11k

0

Entering edit mode

That'll be the next paper then. Genome Res?

Spoke too soon - they cranked the 2nd out already

http://biorxiv.org/content/early/2015/04/23/018366.article-info

but still sans-ORFs

ADD REPLY • link updated 3.4 years ago by Ram 45k • written 10.6 years ago by cdsouthan ★ 1.9k

Ram · Answer 1 · 2015-04-28

2

Entering edit mode

10.6 years ago

Emily 24k

Unfortunately this genome is not a proper assembly, just a read library that is not suitable for annotation. If it were suitable for annotation, we still could not annotate because there is not a suitable gene set to annotate with. Being extinct, mammoths have no active transcription, so there is no mammoth cDNA or protein set, and the elephant gene set is already a low quality gene set that was projected from other species, so would not be good enough for this analysis.

ADD COMMENT • link 10.6 years ago by Emily 24k

0

Entering edit mode

Thanks, its interesting I overlooked your point on eventual transcript support issue for ancient genomes. Notwithstanding, it would be useful to get at least the more solid ORFs into TrEMBL somehow, but I guess this is predecated on a better Elephant assembly (hardly sample-limited one would have thought....)

ADD REPLY • link updated 3.4 years ago by Ram 45k • written 10.6 years ago by cdsouthan ★ 1.9k

Ram · Answer 2 · 2015-04-28

You can putatively use the elephant GTF file of annotations, with the major caveat that genes and gene boundaries may have changed. The mammoth assembly is mapped onto Loxodonta and will miss DNA specific to mammoths. If you want to find elephant genes present in mammoth, this is relatively easy. If you want to find mammoth genes missing in elephant, this will be much less tractable.