Question

Cannot align reads to plasmid

0

Entering edit mode

7.4 years ago

David ▴ 240

Hi, I have sequenced a bacterial genome for which i have a reference genome (98% similarity).

I have used bwa to map reads to the reference genome: bwa mem reference.fa reads.R1.fq.gz reads.R2.fq.gz

I´m failing to recover the plasmid although i know it´s there. I have run the assembly using megahit and align the contigs to the plasmid and i recover 88% of the plasmid.

What i don´t understand is why the reads do not map to the plasmid ???? - samtools flagstat PLASMID.sorted.bam -

1435694 + 0 in total (QC-passed reads + QC-failed reads)   0 + 0 secondary 
0 + 0 supplementary 
0 + 0 duplicates 
0 + 0 mapped (0.00% :N/A) 
1435694 + 0 paired in sequencing 
717847 +  0 read1
717847 + 0 read2
0 + 0 properly paired (0.00% : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A) 
0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)

If i check the reads after the genome assembly i get pretty good mapping

1122036 + 0 in total (QC-passed reads + QC-failed reads) 
0 + 0 secondary
574 + 0 supplementary
0 + 0 duplicates 
1116358 + 0 mapped (99.49% : N/A)
1121462 + 0 paired in sequencing 
560767 + 0 read1 
560695 + 0 read2 
1108556 + 0 properly paired (98.85% : N/A)
1110902 + 0 with itself and mate mapped
4882 + 0 singletons (0.44% : N/A)
1598 + 0 with mate mapped to a different chr 1210 + 0 with mate mapped to a different chr (mapQ>=5)

Any idea why i´m missing the plasmid when aligning clean reads directly to the plasmid ???

sequencing bwa • 2.5k views

ADD COMMENT • link updated 7.4 years ago by h.mon 35k • written 7.4 years ago by David ▴ 240

0

Entering edit mode

Can you try using bbsplit.sh from BBMap suite using plasmid and genome sequence at the same time to bin the reads? You have not said what length your reads are (are they trimmed/cleaned of adapters). Pay attention to the settings about the reads that multi-map (across and within the genomes provided)

ADD REPLY • link 7.4 years ago by GenoMax 152k

0

Entering edit mode

Thanks for your response genomax. It´s an illumina 2*250bp on a single bacterial genome. It turns out that insert size average is 300, not that good, but i have quality trimmed all sequences and remove adapters and phiX genome.

Here is the output from bbspplit

#name   %unambiguousReads       unambiguousMB   %ambiguousReads ambiguousMB     unambiguousReads        ambiguousReads
Reference_genome_without_plasmid    99.36129        226.48699       0.00018 0.00026 1121942 2
plasmid  0.00000 0.00000 0.00018 0.00026 0       2

The idea behind the sequencing of that specific strain is that is´s phenotypically different from the reference, so the idea is to look at the genome and find if there is genomic event that might explain this phenotipically difference.

ADD REPLY • link 7.4 years ago by David ▴ 240

score 0 · Answer 1 · 2018-02-17

You are failing to report some fundamental information: what is the similarity between the plasmid and the assembled contigs? How are you aligning the contigs to the plasmid? What is the coverage of the contigs mapping to the plasmid? Is it different from the contigs mapping to the bacterial genome?

Maybe the similarity is too low to map short reads to the plasmid with bwa, but high enough to align the contigs to the plasmid with whatever software you used (blast?).