I've mapped 75 bp paired-end Illumina RNA-seq data to a bacterial draft genome. We are interested in a family of genes belonging to trimeric autotransporter family. As can be appreciated in the figure, it should be quite difficult to have reads mapping uniquely to one gene because all the genes have same domains. However, we have many reads mapping with MAPQ>20 to this domains.
There could be some explanations to this; however, both of these cases should be flagged as having multiple mappings:
- Subtle SNP differences enough for BWA to discriminate these similar regions
- BWA acting randomly when assigning best match
I would like to know if it's safe to rely in BWA MAPQ>20 value as threshold.
BWA code:
#BWA index
bwa index $genome_file
#BWA mapping
bwa mem -t 8 $genome_file $fastq_file_R1 $fastq_file_R2 | gzip -3 > P_S1_L001_aln-pe.sam_inverted.gz
bwa mem -t 8 $genome_file $fastq_file_R3 $fastq_file_R4 | gzip -3 > V_S1_L001_aln-pe.sam_inverted.gz
Here is the figure showing some proteins of the family, from Pina S. 2009 et al.: http://www.ncbi.nlm.nih.gov/pubmed/19011035
Thanks,
Bernardo