Question

Multiple genome alignment for metatranscriptomics data

1

Entering edit mode

9.5 years ago

rpjl1230 ▴ 20

Hello!

I am totally new to metatranscriptomics and apologised in advance if this has already been covered. I was wondering if anyone can recommend any packages/software suitable for mapping to multiple bacterial genomes simultaneously suitable for metatranscriptomics analysis (note: my average fastq size is ~30-35GB!)?

We previously ran our data in kraken and got an idea of what the most abundant bacteria are in the sample, but the specificity is only to the species level and not to individual strains. I then mapped the reads to the reference strain for each of the species that we are interested in one by one using bwa (default parameters), but we are worried that the reads we are getting might be due to cross-reactivity with other strains or other closely related species and aligning to multiple genomes simultaneously might reduce this issue.

Any suggestion will be greatly appreciated! Many thanks!

RNA-Seq metatranscriptomics • 2.7k views

ADD COMMENT • link updated 23 months ago by Ram 44k • written 9.5 years ago by rpjl1230 ▴ 20

0

Entering edit mode

you can try GOTTCHA, it kind of works like kraken except it uses a mapper to identify and not LCA/longest path as kraken does...( ia m mobile so I don't have a link right now... sry)

ADD REPLY • link 9.5 years ago by Phil S. ▴ 700

Ram · Answer 1 · 2015-05-27

I think the best thing for you regarding speed and specificity is BLAST, or specifically, megaBLAST which is the default search mode for current Blast tool.

I have done similar analyses. Although I have not focused on identifying strains.

BLAST can take a very long time if your evalue is set too low. You mentioned this is transcriptome work -- how long are your RNA sequence shotgun reads? If you have 50 bp single end, I would not think what you want to do is plausible with BLAST.

Based on experience, the task of identifying different strains in a microbiome is difficult at best. Differentiating between species is often not possible when using Metagenomics Taxa Classification tools. Furthermore, I would expect within species variability to be high in terms of recombinations such as horizontal gene transfer. Based on alignments alone, this would mean at best you could say a strain exist in your sample that has a sequence MOST SIMILAR to a strain in the metagenome database.

Another suggestion I would offer is that you can search through the Kraken raw output. While Kraken offers a decision for the taxa it believes is best, it also reports the best hits for all kmers within the query. It is possible the strains are popping up there.

Also, I am curious, are you using a Kraken database built with all the strains you are looking for, or something smaller, like Minikraken database? Such a kraken database would probably require a supercomputer, as such, I suspect you may not have tried this.

I would really like to know if that GOTTCHA program does what you want; I have not had a chance to use it yet.