Illumina reads mapped back onto contigs have gaps
0
0
Entering edit mode
4.2 years ago

I assembled a genome using pac bio reads with the Flye Assembler. I then mapped my cleaned up illumina reads using bwa-mem onto the longest contigs and there is gaps in the contigs. Why is this?

illumina • 1.2k views
ADD COMMENT
0
Entering edit mode

there is gaps in the contigs.

If that means Illumina reads are not completely covering the said contigs then

  1. You don't have enough Illumina sequence data to provide adequate coverage
  2. Illumina data comes from RNAseq so only covers expressed part of the genome
  3. Your flye assembly is incorrect
ADD REPLY
0
Entering edit mode

I trimmed the illumina reads for quality and to trim the adapters. I used Trimmomatic with these parameters:

LEADING:10 TRAILING:10 SLIDINGWINDOW:5:30 MINLEN:150

The genome size is ~ 200 Mb These are the results from trimming of the illumina reads:

Input Read Pairs: 128614698 Both Surviving: 49197716 (38.25%) Forward Only Surviving: 20232226 (15.73%) Reverse Only Surviving: 13258328 (10.31%) Dropped: 45926428 (35.71%)

This is not RNAseq (but I do also have RNAseq illumina reads). Are there supposed to be zero gaps once I map illumina reads back onto assembly contigs? Am I trimming the illumina reads too much? Do I need to trim the reads at all? Thanks!!

ADD REPLY
0
Entering edit mode

I don't know what the length of sequencing you have since even with the trimming you are setting the minlength to be 150. You could try aligning without any trimming at all and see if things improve.

Are there supposed to be zero gaps once I map illumina reads back onto assembly contigs?

Ideally. One would think there should be no bases/areas that not covered by at least a few reads as long as the assembly is good and there is plenty of Illumina data to cover the assembly. Did the Illumina reads go into the assembly or it was a pure PacBio one.

ADD REPLY
0
Entering edit mode

The illumina sequencing is 2 x 150 bp. The assembly was with pure pac bio reads. I could polished the pac bio assembly using illumina reads with pilon and see if that helps. But I will also try to map illumina reads without trimming or with different trimming parameters.

ADD REPLY
0
Entering edit mode

Would you expect that contaminant contigs that have not yet been removed- would have gaps in coverage or less coverage?

ADD REPLY
0
Entering edit mode

Difficult to say. Were the preps for PacBio and Illumina libraries done from the same genomic DNA? If the "contaminant" contigs are in fact mis-assemblies then Illumina reads could still map to them.

ADD REPLY

Login before adding your answer.

Traffic: 2241 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6