Hi,
I am trying to align Genomic DNA sequence reads (64 bases in length) to reference human genome. The sequence reads are single ended. I am using bwa to align. My question is whether is it possible to align sequence reads by giving a parameter option to zero mismatch. I want to extract reads those are perfectly matches to reference genome. I have tried with following command line, it generates bam files with no errors with zero reads for zero mismatch criteria. Also I have tried to use fraction, but in these case I get more reads mapped to reference genome than when I used more relaxed criteria such as allowing maximum of 4 mismatches. If I understand correctly, more stringent criteria will generate less number of reads mapped than relaxed criteria. I would appreciate any help with this issue explaining the parameter, what it does and why I don't get mapped reads for zero mismatch criteria. Thanks for your help.
Arshad
command for Using zero mismatch parameter
bwa aln -n 0 ./hg38bwaidx ./merged_w1_0_nM.fastq.gz > zero_nM-0.txt.bwa
bwa samse ./hg38bwaidx zero_nM-0.txt.bwa ./merged_w1_0_nM.fastq.gz > zero_nM-0.txt.sam
samtools view -b -S -o zero_nM-0.bam zero_nM-0.txt.sam
command for using fraction parameter (presumably should be 0.1% mismatch of the reads of 64 bases are allowed)
bwa aln -n 0.001 ./hg38bwaidx ./merged_w1_0_nM.fastq.gz > zero_nM-0.001.txt.bwa
bwa samse ./hg38bwaidx zero_nM-0.001.txt.bwa ./merged_w1_0_nM.fastq.gz > zero_nM-0.001.txt.sam
samtools view -b -S -o zero_nM-one.bam zero_nM-0.001.txt.sam