Question

bwa mem misaligns my contigs to the reference by a few bases

1

Entering edit mode

7.4 years ago

biomagician ▴ 410

Dear all,

I have indexed the C. elegans reference genome with:

bwa index output/genome/ref/seq/celegans.fa

and then aligned my de novo assembly to the reference with:

bwa mem -t 8 -x intractg output/genome/ref/seq/celegans.fa input/assembly/celegans/hgap/bristol/assembly.fa > output/alignment/pacbio/bwa/ref/bristolAssembly.sam

I then

samtools view -bS output/alignment/pacbio/bwa/ref/bristolAssembly.sam > output/alignment/pacbio/bwa/ref/bristolAssembly.bam

samtools sort output/alignment/pacbio/bwa/ref/bristolAssembly.bam -o output/alignment/pacbio/bwa/ref/bristolAssemblySorted.bam

samtools index output/alignment/pacbio/bwa/ref/bristolAssemblySorted.bam

to visualise the alignment in IGV.

This is what I get: http://cristian-riccio.ch/wp-content/uploads/2017/07/igv_snapshot.png

You can see that my assembly is the same as the reference, just shifted by 12 bases or so. Has anyone any suggestions about how to solve this problem?

Thanks.

Best,

C.

bwa bwt bwa mem contig alignment • 4.4k views

ADD COMMENT • link updated 6.4 years ago by Biostar 20 • written 7.4 years ago by biomagician ▴ 410

1

Entering edit mode

my assembly is the same as the reference

Have you tried to use the actual reference sequence you used for your own alignments (instead of going on the label of genome build) in IGV?

ADD REPLY • link 7.4 years ago by GenoMax 147k

0

Entering edit mode

Thanks, good idea. I've tried but there is no change. I went on the Genomes tab in IGV, loaded the reference I used for alignment and then loaded my BAM file. Still the same.

ADD REPLY • link 7.4 years ago by biomagician ▴ 410

1

Entering edit mode

Hmm. You are actually aligning the assembly not the original reads. Assuming the assembly is long, have you tried to use a tool meant for such alignments as blat, LAST, or LASTZ? Since this is C. elegans, even mauve may be able to handle the alignments with the added benefit of showing you any rearrangements.

ADD REPLY • link 7.4 years ago by GenoMax 147k

0

Entering edit mode

genomax, do you know if Mauve can output a BAM file?

ADD REPLY • link 7.4 years ago by biomagician ▴ 410

1

Entering edit mode

is it really shifted, or just IGV is showing you this 12 bp shift?

ADD REPLY • link 7.4 years ago by stolarek.ir ▴ 700

0

Entering edit mode

It is really shifted I think. On this image, I am only showing the mismatched bases and you can see that there are many mismatches despite the two sequences being the same. http://cristian-riccio.ch/wp-content/uploads/2017/07/igv_snapshotreallyshifted.png

ADD REPLY • link 7.4 years ago by biomagician ▴ 410

1

Entering edit mode

I'm asking about the BAM file, not the IGV visualization. If it's a true missalignment, then you can calculate from the BAM fields (like NM) the number of differences, and from the CIGAR string you can track down where these mismatches could be if truly they exist

ADD REPLY • link 7.4 years ago by stolarek.ir ▴ 700

score 3 · Accepted Answer · 2017-07-03

Solved finally!

http://cristian-riccio.ch/wp-content/uploads/2017/07/igv_snapshotSolution.png

I used this command instead, penalising gap extensions less.

bwa mem -t 8 -E0.5 -x intractg output/genome/ref/seq/celegans.fa input/assembly/celegans/hgap/bristol/assembly.fa > output/alignment/pacbio/bwa/ref/bristolAssembly.sam

The assembly comes from pacbio, and this technology is 12% indel error in the raw reads, that might be why the contigs also need a bit more lee-ways on indels for alignment.

I figured it out with a colleague because we looked at the start of the chromosomes and it started off aligned and got misaligned when there was a larger indel in my assembly.

Thanks to all for the suggestions.

Best,

C.