Entering edit mode
5.4 years ago
c.e.chong
▴
60
Hi all,
I would like to create a database of all the bacterial and fungal complete genomes in the RefSeq database to map all of my Metagenome samples to it.
Does anyone know if this is possible? The manual states that you have to input a comma separated list of fasta files to build the database, is there another way to do this if I have so many fasta files?
Thank you in advance!
It should certainly be possible provided you have enough compute resources available locally. Instead of using every genome have you considered getting representatives for broad classes? That should reduce the search space to some extent.
There are tools like kraken2 and centrifuge that are meant for taxonomic assignment of reads that may also be more appropriate in this case.
Thank you for your reply!
I have considered getting representatives for broad classes. My plan was to use CD-HIT to remove redundancy once I had downloaded all of the bacterial and fungal complete genomes, do you think this is the best method?
I have used Kraken2 but I want to access the bam files so that I can then use this for statistics and I am unsure how to do this from the kraken output?
Thanks!