Entering edit mode
6.8 years ago
Kevin.Y
▴
10
I am trying to align my total RNA-seq data of bacteria and human reads. I have already aligned to the human genome and am using the leftover unaligned reads to align to bacterial genomes.
I would like to align to all the genomes on the NCBI refseq database, but the genomes there are broken up into contigs, scaffolds and complete genome sequences. Has anyone done this kind of alignment, and would be able to provide advice on which sequences would be best to align against?
Is the intent to identify what is there? You may want to look at kraken, in that case.
Yes the intention is to be able to identify down to which bacterial species are present.
If you can get away with using
nr
or refseq bacteria then you could look into DIAMOND. Aligning against 100s of bacterial genomes would not be a trivial thing.Especially if you plan to use a "regular" NGS aligner.While you did not ask for this I would like to make you aware of
bbplit.sh
part of BBMap suite. This is useful when read binning is required and there are multiple genomes present.Sorry I forgot to mention that I plan on aligning to the bacterial genome using MagicBlast