Entering edit mode
4.2 years ago
kristina.mahan
▴
170
I assembled a genome using pac bio reads with the Flye Assembler. I then mapped my cleaned up illumina reads using bwa-mem onto the longest contigs and there is gaps in the contigs. Why is this?
If that means Illumina reads are not completely covering the said contigs then
I trimmed the illumina reads for quality and to trim the adapters. I used Trimmomatic with these parameters:
The genome size is ~ 200 Mb These are the results from trimming of the illumina reads:
This is not RNAseq (but I do also have RNAseq illumina reads). Are there supposed to be zero gaps once I map illumina reads back onto assembly contigs? Am I trimming the illumina reads too much? Do I need to trim the reads at all? Thanks!!
I don't know what the length of sequencing you have since even with the trimming you are setting the minlength to be 150. You could try aligning without any trimming at all and see if things improve.
Ideally. One would think there should be no bases/areas that not covered by at least a few reads as long as the assembly is good and there is plenty of Illumina data to cover the assembly. Did the Illumina reads go into the assembly or it was a pure PacBio one.
The illumina sequencing is 2 x 150 bp. The assembly was with pure pac bio reads. I could polished the pac bio assembly using illumina reads with pilon and see if that helps. But I will also try to map illumina reads without trimming or with different trimming parameters.
Would you expect that contaminant contigs that have not yet been removed- would have gaps in coverage or less coverage?
Difficult to say. Were the preps for PacBio and Illumina libraries done from the same genomic DNA? If the "contaminant" contigs are in fact mis-assemblies then Illumina reads could still map to them.