Hello, I am Saraswati and I am new in the field of metagenomics. I have to do taxonomic classification(Archea, Bacteria, Eukaryotes and Viruses) of the shotgun sequence which is 3.5 GB in size by using tools DIAMOND and MEGAN6.
I downloaded the .sra file from ncbi and splitted into two files, forward and reverse .fastq format using sra toolkit. Now I have to do blastx of the sequence using DIAMOND but I want to know if I should merge forward and reverse reads into a single file or I should do blastx with forward read only? If I should merge then which tool I should use for that purpose?
I want to do blastx against nr database I tried doing it with the forward end .fasta file but it is taking a long time(more than 7 days and still continuing) so how should I make it faster? Also if there are some other tools or strategy to the same for taxonomy in short time and better way then please suggest me.
Yes, you should assemble paired-end reads (assuming they overlap) with any paired-end assembler tool, such as
PEAR
(this is just one among many tools available - paper) before trying to annotate them. There are software written to annotate taxonomically metagenomes such askraken2
(site) andcentrifuge
(site) among others. You can, of course depending on your objective, try to assemble the metagenomes into genomes (the so called MAGs) annotating this after.If you're blasting locally, you can provide
-num_threads
(depending on the threads that your computer/server has) to parallelize the work. If you're blasting remotely this is not possible as far as I know. Blasting entire fastq files without performing any kind of clustering etc, it will take some time, assuming that each file just have a few million of reads.IDseq is also a good option for taxonomic assignment.
Of several issues in uploading data and analysis on IDSeq server, this should be enough to scare people from IDseq privacy statement on metadata of the data uploaded by user (copy/pasted from https://chanzuckerberg.zendesk.com/hc/en-us/articles/360058195412-Preview-of-IDseq-s-New-Privacy-Policy-Terms-of-Service-Effective-April-1-2021-):
Use ASAIM protocol which uses OSS tools like metaphlan3. MEGAN 6 is dual licensed (AFAIK).