Question

Bwa alignment problem of single end DNA reads

0

Entering edit mode

9.1 years ago

akhan • 0

Hi,

I am trying to align Genomic DNA sequence reads (64 bases in length) to reference human genome. The sequence reads are single ended. I am using bwa to align. My question is whether is it possible to align sequence reads by giving a parameter option to zero mismatch. I want to extract reads those are perfectly matches to reference genome. I have tried with following command line, it generates bam files with no errors with zero reads for zero mismatch criteria. Also I have tried to use fraction, but in these case I get more reads mapped to reference genome than when I used more relaxed criteria such as allowing maximum of 4 mismatches. If I understand correctly, more stringent criteria will generate less number of reads mapped than relaxed criteria. I would appreciate any help with this issue explaining the parameter, what it does and why I don't get mapped reads for zero mismatch criteria. Thanks for your help.

Arshad

command for Using zero mismatch parameter

bwa aln -n 0 ./hg38bwaidx ./merged_w1_0_nM.fastq.gz >  zero_nM-0.txt.bwa
bwa samse ./hg38bwaidx zero_nM-0.txt.bwa ./merged_w1_0_nM.fastq.gz > zero_nM-0.txt.sam
samtools view -b -S -o zero_nM-0.bam zero_nM-0.txt.sam

command for using fraction parameter (presumably should be 0.1% mismatch of the reads of 64 bases are allowed)

bwa aln -n 0.001 ./hg38bwaidx ./merged_w1_0_nM.fastq.gz >  zero_nM-0.001.txt.bwa
bwa samse ./hg38bwaidx zero_nM-0.001.txt.bwa ./merged_w1_0_nM.fastq.gz > zero_nM-0.001.txt.sam
samtools view -b -S -o zero_nM-one.bam zero_nM-0.001.txt.sam

bwa alignment • 3.4k views

ADD COMMENT • link updated 2.1 years ago by Ram 44k • written 9.1 years ago by akhan • 0

Ram · Answer 1 · 2015-09-17

1

Entering edit mode

9.1 years ago

Brian Bushnell 20k

BBMap can do this with the perfectmode flag:

bbmap.sh ref=hg38.fasta in=reads.fq out=perfect.sam perfectmode

Alternately, semiperfectmode will allow reads that map perfectly but extend off the end of contigs (in other words, they overlap Ns in the reference). If you give a suffix of .fastq to the output file the reads will be output in fastq instead of sam.

ADD COMMENT • link updated 2.1 years ago by Ram 44k • written 9.1 years ago by Brian Bushnell 20k

score 0 · Answer 2 · 2015-09-17

You could parse the output SAM file to keep only the header lines and the records containing the tag "NM:i:0" (edit distance of 0):

awk '$0 ~ "^@" || $0 ~ "\tNM:i:0"' aligned.sam | samtools view ...

(Anyway, why doesn't bwa aln -n 0 ... suite you?)

However, are you sure you want to keep perfect match reads? Depending on your experiment you might bias in favour of the reference sequence and against genuine SNPs.