shotgun metagenomics assembly
2
0
Entering edit mode
6.5 years ago
luyang1005 ▴ 20

Hi, community,

I am dealing with shotgun metageomincs data with 100bp paired-end reads from Hiseq.

I have used the default setting in metaSPAdes, IDBA and MEGAHIT to assemble the reads. But I do not know which one is better. Any suggestions? Are there any indicators to evaluate it? Even I know some software like QUAST can do this, they give me too many parameters, I do not know to choose which one. May I know %raw reads can be assembled matters? How to know how many reads can be assembled?

Thanks in advance.

next-gen assembly • 2.3k views
ADD COMMENT
1
Entering edit mode

How to know how many reads can be assembled

You can backmap the reads with any mapper. bowtie2 for example, tells you directly how many reads map.

ADD REPLY
1
Entering edit mode
6.5 years ago
dllopezr ▴ 130

Hi

Look this article, can be helpful; van der Walt AJ, van Goethem MW, Ramond J-B, Makhalanyane TP, Reva O, Cowan DA. Assembling metagenomes, one community at a time. BMC Genomics. 2017;18:521.

Check out this one too: Papudeshi B, Haggerty JM, Doane M, Morris MM, Walsh K, Beattie DT, et al. Optimizing and evaluating the reconstruction of Metagenome-assembled microbial genomes. BMC Genomics

ADD COMMENT
0
Entering edit mode

Yes, the first paper is helpful. Thanks. But I still do not know whether my assembly is Ok or not. Since my raw fastq reads after trimmomatic is 35G, while after assembly, it is 200M. Anything wrong?

ADD REPLY
1
Entering edit mode

Hi luyang1005, this is close to impossible to tell based on the sparse information you provided. Probably all is good if each of your three assemblies shows similar trends. Probably not if each assembly is bigger by a factor compared to another. If your expected microbiome complexity is moderate, this is probably it. If you expect a highly complex microbiome, something might have gone wrong.

ADD REPLY
0
Entering edit mode

Hi, Carambakarocho, Thanks for your reply. My purpose is to know what taxonomy is there and do get some functional characterization of the samples. My samples are anaerobic manure samples. I have got the 25000 contigs from IDBA and MEGAHIT, 44000 contigs from metaSpades. Each software has weak indicators and good performed parts. I am in the confusing status (1) Does these assemblers suitable for the next step bin or annotation? (2) Binning is a must procedure? Or go ahead to annotate is ok? I am still confusing. Any suggestions? Millions of thanks in advance.

ADD REPLY
1
Entering edit mode

unfortunately, the direct comparison of fragmented assemblies is not trivial, not even for single genomes, and even worse for metagenomes. Neither contig length nor number are a good measure for assembly quality.

In any case you can go ahead with protein prediction and annotation. You can then check which proteins are in both assemblies and see how big the difference really is. Diamond is a good blastp substitute. An excellent source for functional annotation is the EggNOG database and the eggnog-mapper, though other people might have different opinions. In case you have more than one condition and assembled all reads, you can bin your contigs using something like metabat or concoct.

ADD REPLY
0
Entering edit mode

Thanks so much for the suggestions. I will go ahead on protein prediction and annotation to have a look. Yesterday I have done one sample's binning. And I checked it with CheckM, it seems that only binning for each sample I can get 30 (total76) bins with completeness> 90% and contanmination<5%. But the lowest level is class level, I think it is not a good situation. Right? Besides, I have also mapped raw cleaned reads to my assemblers by BOWTIE2, 50% of reads can be mapped. Is this enough? OR I need to go back to adjust some parameters in the assembly to adjust a new assembly can improve it? Sorry for so many questions, your help is really appreciated! Thanks a lot.

ADD REPLY
0
Entering edit mode

the lowest level is class level

Classification is extremely depended on how well your microbiome composition is represented in the database - I had less than 5% classifiable sequence against the nt database but more than 50% against a custom build database based on a study from colleagues.

50% of reads can be mapped

This seems rather low. Without filtering the assembly, you should get more than that, especially on the spades assembly

ADD REPLY
0
Entering edit mode

I see. Thanks for your answers.

ADD REPLY

Login before adding your answer.

Traffic: 1846 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6