Bwa alignment problem of single end DNA reads
2
0
Entering edit mode
9.1 years ago
akhan • 0

Hi,

I am trying to align Genomic DNA sequence reads (64 bases in length) to reference human genome. The sequence reads are single ended. I am using bwa to align. My question is whether is it possible to align sequence reads by giving a parameter option to zero mismatch. I want to extract reads those are perfectly matches to reference genome. I have tried with following command line, it generates bam files with no errors with zero reads for zero mismatch criteria. Also I have tried to use fraction, but in these case I get more reads mapped to reference genome than when I used more relaxed criteria such as allowing maximum of 4 mismatches. If I understand correctly, more stringent criteria will generate less number of reads mapped than relaxed criteria. I would appreciate any help with this issue explaining the parameter, what it does and why I don't get mapped reads for zero mismatch criteria. Thanks for your help.

Arshad

command for Using zero mismatch parameter

bwa aln -n 0 ./hg38bwaidx ./merged_w1_0_nM.fastq.gz >  zero_nM-0.txt.bwa
bwa samse ./hg38bwaidx zero_nM-0.txt.bwa ./merged_w1_0_nM.fastq.gz > zero_nM-0.txt.sam
samtools view -b -S -o zero_nM-0.bam zero_nM-0.txt.sam

command for using fraction parameter (presumably should be 0.1% mismatch of the reads of 64 bases are allowed)

bwa aln -n 0.001 ./hg38bwaidx ./merged_w1_0_nM.fastq.gz >  zero_nM-0.001.txt.bwa
bwa samse ./hg38bwaidx zero_nM-0.001.txt.bwa ./merged_w1_0_nM.fastq.gz > zero_nM-0.001.txt.sam
samtools view -b -S -o zero_nM-one.bam zero_nM-0.001.txt.sam
bwa alignment • 3.4k views
ADD COMMENT
1
Entering edit mode
9.1 years ago

BBMap can do this with the perfectmode flag:

bbmap.sh ref=hg38.fasta in=reads.fq out=perfect.sam perfectmode

Alternately, semiperfectmode will allow reads that map perfectly but extend off the end of contigs (in other words, they overlap Ns in the reference). If you give a suffix of .fastq to the output file the reads will be output in fastq instead of sam.

ADD COMMENT
0
Entering edit mode
9.1 years ago

You could parse the output SAM file to keep only the header lines and the records containing the tag "NM:i:0" (edit distance of 0):

awk '$0 ~ "^@" || $0 ~ "\tNM:i:0"' aligned.sam | samtools view ...

(Anyway, why doesn't bwa aln -n 0 ... suite you?)

However, are you sure you want to keep perfect match reads? Depending on your experiment you might bias in favour of the reference sequence and against genuine SNPs.

ADD COMMENT

Login before adding your answer.

Traffic: 948 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6