RNA-seq reads mapping to paralogs with high MAPQ. How to ensure this?
1
0
Entering edit mode
10.1 years ago
biotech ▴ 570

I've mapped 75 bp paired-end Illumina RNA-seq data to a bacterial draft genome. We are interested in a family of genes belonging to trimeric autotransporter family. As can be appreciated in the figure, it should be quite difficult to have reads mapping uniquely to one gene because all the genes have same domains. However, we have many reads mapping with MAPQ>20 to this domains.

There could be some explanations to this; however, both of these cases should be flagged as having multiple mappings:

  • Subtle SNP differences enough for BWA to discriminate these similar regions
  • BWA acting randomly when assigning best match

I would like to know if it's safe to rely in BWA MAPQ>20 value as threshold.

BWA code:

#BWA index
bwa index $genome_file
#BWA mapping
bwa mem -t 8 $genome_file $fastq_file_R1 $fastq_file_R2 | gzip -3 > P_S1_L001_aln-pe.sam_inverted.gz
bwa mem -t 8 $genome_file $fastq_file_R3 $fastq_file_R4 | gzip -3 > V_S1_L001_aln-pe.sam_inverted.gz

Here is the figure showing some proteins of the family, from Pina S. 2009 et al.: http://www.ncbi.nlm.nih.gov/pubmed/19011035

Thanks,
Bernardo

paralogs BWA RNA-Seq MAPQ • 3.4k views
ADD COMMENT
1
Entering edit mode
10.1 years ago

If BWA is assigning a MAPQ of 20 then there's enough sequence divergence for it to do so. In such cases, it would be completely inappropriate for BWA to call the alignments multimapping, since they're not (a read is only multimapping if the best and second best alignment have the same score...meaning the MAPQ is 0 (probably...this ends up being aligner-dependent)).

Edit: BTW, you can just pipe to samtools, which uses blocked gzip compression. That'll give you better integration downstream as well as better performance if you ever need random access within the file.

ADD COMMENT

Login before adding your answer.

Traffic: 1674 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6