Can it be that because the read pairs are discordant the mapping quality of the first read in pair will be zero?
I've pre-filtered my exome data, and then aligned it using BWA-MEM to FASTA made of HLA alleles. Of course I get a lot of multiple mappings, so I filter for primary alignments only.
So what I'm left with is BAM containing only primary alignments aligned to the HLA FASTA. Weird thing is that MAPQ is zero for primary alignment. Can anyone help?
Here is the example of one record: D0MBKACXX120224:7:1302:15153:85170 81 hla_a_01_01_15 1234 0 76M hla_a_23_01_08 1207 0 GCGTGGACGGGCTCCGCAGATACCTGGAGAACGGGAAGGAGACGCTGCAGCGCACGGGTACCAGGGGCCACGGGGC AAFBCD?BDFGH@BBGIDGA@>JHBFFCHA@@FDHCEFIDHA@HHCIJDE@FDAAGGG=AGJCGEGHHI>@FFEEB NM:i:0 MD:Z:76 AS:i:76 XS:i:76 RG:Z:1
Here is the command line from file header also: @PG ID:bwa PN:bwa CL:/opt/bwa-0.7.13/bwa mem -P -V -a -L 1000 -R @RG\tID:1\tPL:Illumina\tSM:sample -t 1 hla.fasta pe_1.fastq pe_2.fastq VN:0.7.13-r1126 @PG ID:SAMBLASTER CL:samblaster -i /dev/stdin -o /dev/stdout --removeDups VN:0.1.22
basic grep + list of subsequences of interest, but I think it should not be of importance for the question here.
The point here is that the shown record is primary alignment but discordant. Should it's MAPQ be zero or not?
If this is paired-end sequencing and you took out only one of the reads (and did not remove its mate from second file) then you would throw the order of reads off and may get discordant alignments. If the read multi-maps then it will get a MAPQ of 0.
Your alignment score is 76 (AS). But XS tells you there is also a possible altermative alignment with the same score of 76. So MAPQ, which is a measure of how sure the aligner is, must be very low. The filtering out of alternative alignments does not improve MAPQ and rightfully so.
If the mapping quality score (MAPQ) of a read is zero, then it means that the read maps to multiple positions equally well. MAPQ is a property of single reads, and is not changed by pairing. However, the mate reads can be used to attempt to chose between the alternatives, as some will be flagged "properly paired" and some not, depending on whether the position of the mate read is consistent with the way the library was constructed. In favourable scenarios, only one paired alignment is proper, and BWA will select the primary alignment accordingly. But the MAPQ of the multimapping read will stay low.
In your case, you are aligning against HLA variants, therefore there are very similar reference sequences in your index, and as a consequence you can expect a large number of alignment with a mapping quality that is null or very low. The example read that you showed is not properly paired, so you really can not extract reliable information from it. I think that you should consider discarding it. Perhaps this alignment happened because sequencing errors tipped the balance in favour of one haplotype for read1, and another haplotype for read2. And perhaps you have reference sequences so similar that distinguishing between them becomes very hard. Depending on the goals of your analysis, I recommend that you look at where are such blind spots in your index.
Here is the example of one record:
D0MBKACXX120224:7:1302:15153:85170 81 hla_a_01_01_15 1234 0 76M hla_a_23_01_08 1207 0 GCGTGGACGGGCTCCGCAGATACCTGGAGAACGGGAAGGAGACGCTGCAGCGCACGGGTACCAGGGGCCACGGGGC AAFBCD?BDFGH@BBGIDGA@>JHBFFCHA@@FDHCEFIDHA@HHCIJDE@FDAAGGG=AGJCGEGHHI>@FFEEB NM:i:0 MD:Z:76 AS:i:76 XS:i:76 RG:Z:1
Here is the command line from file header also:
@PG ID:bwa PN:bwa CL:/opt/bwa-0.7.13/bwa mem -P -V -a -L 1000 -R @RG\tID:1\tPL:Illumina\tSM:sample -t 1 hla.fasta pe_1.fastq pe_2.fastq VN:0.7.13-r1126 @PG ID:SAMBLASTER CL:samblaster -i /dev/stdin -o /dev/stdout --removeDups VN:0.1.22
What criteria and which software?
basic grep + list of subsequences of interest, but I think it should not be of importance for the question here. The point here is that the shown record is primary alignment but discordant. Should it's MAPQ be zero or not?
If this is paired-end sequencing and you took out only one of the reads (and did not remove its mate from second file) then you would throw the order of reads off and may get discordant alignments. If the read multi-maps then it will get a MAPQ of 0.