Short contigs after megahit assembly of metagenomic samples
0
0
Entering edit mode
5.2 years ago
Biok • 0

Hi everyone,

I am working with metagenomic samples and I co-assembled the two metagenomic samples with megahit, the problem is that I got a lot of less than 1000 bp contigs that I had to remove because my goal is to reconstruct metagenome-assembled genomes. But by doing this, I lost a lost 93% of contigs and ~80 % of nucleotides from my assembly, which means that I lost a lot of information. I give some information of the assembly after selecting contigs greater than 1000 bp:

  1. Input: final.contigs.fa
  2. Output : contigs.1000min.fasta
  3. Minimum length : 1,000
  4. Total num contigs : 7,050,789
  5. Total num nucleotides :3,702,128,215
  6. Contigs removed : 6595321 (93.54% of all)
  7. Nucleotides removed: 2930511989 (79.16% of all)

I am new to metagenomics and I am not sure what can I do to improve the assembly.. After the assembly, I mapped my samples against the assembly using bowtie2 but I got a low alignment rate of the reads of my sample (~40%) which is logic given that I lost a lot of sequences. Do you have any suggestion to improve the assembly?

next-gen assembly sequence • 3.4k views
ADD COMMENT
1
Entering edit mode

Complex metagenomes are tough to assemble, but I would not consider typical that 93% of assembled contigs are smaller than 1000 bp. I would consider simple explanations first. Are sequencing adapters removed properly? Even if you were told that they were removed, it never hurts to confirm it. I got one batch recently where the adapters were removed, but almost 1% of sequences still had them when I tested it with AdapterRemoval. Other adapter removal programs such as trimmomatic will work as well. In my case the small number of untrimmed adapters would not cause an assembly with your outcome, but it degraded it for sure. Next, make sure that you have data of good quality, which can be quickly assessed with seqtk.

If everything checks out, I suggest you try re-assembling in meta-sensitive mode with megahit, or with metaSPAdes as already suggested.

ADD REPLY
0
Entering edit mode

That is typically the case with metagenomic samples and it is difficult to pinpoint the reason for such short contigs. It could be anything from degraded genetic material or sequencing library prep to poor de novo assembly or anything in between. I'd suggest that if it is not too much work then try to use a different de novo assembly pipeline e.g. metaSPAdes and check if it may lead to better sequence assembly.

ADD REPLY

Login before adding your answer.

Traffic: 2241 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6