Hello all,
I have a puzzle which I cannot get through. Maybe some of you encountered this as well. I have a set of Nanopore (SQK-LSK109) & Illumina (NextSeq 500 2x150bp) reads for the same organism.
I assembled de novo genome with Flye 2.7 using: flye --nano-raw ~/nanopore.fastq -o ~/flye_assembly/ -g 35m -t 40
Total length: 32363893 Fragments: 86 Fragments N50: 4454457 Largest frg: 9249238 Scaffolds: 0 Mean coverage: 27
Then I mapped those raw Nanopore reads onto the assembly using minimap2: minimap2 -ax map-ont ~/flye_assembly.fa ~/nanopore.fastq
330418 + 0 in total (QC-passed reads + QC-failed reads) 214492 + 0 primary 0 + 0 secondary 115926 + 0 supplementary 0 + 0 duplicates 0 + 0 primary duplicates 330418 + 0 mapped (100.00% : N/A) 214492 + 0 primary mapped (100.00% : N/A) 0 + 0 paired in sequencing 0 + 0 read1 0 + 0 read2 0 + 0 properly paired (N/A : N/A) 0 + 0 with itself and mate mapped 0 + 0 singletons (N/A : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5) average coverage bam file, covered regions = 32.2968
I produced sorted BAM file (mapping quality > 20).
For Illumina data, I trimmed reads using popoolation_1.2.2/basic-pipeline/trim-fastq.pl (>70 bp, >30 q)
number of trimmed reads (x2): 10290402 total number of trimmed nucleotides (x2): 1476293771
I mapped illumina reads onto Flye assembly using BWA MEM (mapping quality > 20)
19111484 + 0 in total (QC-passed reads + QC-failed reads) 19096237 + 0 primary 0 + 0 secondary 15247 + 0 supplementary 0 + 0 duplicates 0 + 0 primary duplicates 19111484 + 0 mapped (100.00% : N/A) 19096237 + 0 primary mapped (100.00% : N/A) 19096237 + 0 paired in sequencing 9558345 + 0 read1 9537892 + 0 read2 18940426 + 0 properly paired (99.18% : N/A) 19079555 + 0 with itself and mate mapped 16682 + 0 singletons (0.09% : N/A) 89147 + 0 with mate mapped to a different chr 89147 + 0 with mate mapped to a different chr (mapQ>=5) average coverage bam file, covered regions = 84.8934
I opened both BAMs in IGV browser and saw that many-many SNPs & indels (100% frequency) supported by Illumina are absent in Nanopore data whatsoever (see example in the snapshot below: top track is Nanopore, bottom track is Illumina).
I think this is really strange so see such clear discrepancies. Is it normal? Does anyone has a clue what am I missing here? I'd appreciate any suggestions.
cheers, alex
how certain are you that the same organism is represented in both datasets? Like you, I also think it should match much better if it were indeed the same
DNA is from a fungus which I grew from the same spore stock