bwa mem -T (alignment score) not doing anything
0
1
Entering edit mode
6.5 years ago
chris.bird ▴ 10

I'm trying to control which reads are mapped to a reference genome by adjusting the the -T parameter in bwa mem, which is the minimum acceptable alignment score. The only problem is that reads with an alignment score below the threshold are being mapped anyway. I can see this when using samtools view to visualize the bam file, reads with an AS value less than the threshold are there. My data is paired end, so I tried doubling the threshold to account for both reads, but that didn't work either.

What's going on here? Do some settings in bwa mem interfere with -T?

SNP genome sequence • 3.5k views
ADD COMMENT
3
Entering edit mode

For things like this, please give the version of BWA, full command line, the reference genome and an example of the problematic reads.

ADD REPLY
0
Entering edit mode

Will create a new post if it's not acceptable to comment with the same issue

I have the same issue, using bwa mem 0.7.13 as follows:

bwa mem NC_000962.fa seq_R1.trimmed.fastq.gz seq_R2.trimmed.fastq.gz -K 100000000 -R '@RG\tID:TEST\tLB:LB_TEST\tPL:illumina\tPU:TEST_RUN\tSM:NC_000962' -T 10 > read_mapping/filtered_mapq10/TEST_mq10.sam

The -K setting is for deterministic alignment, as seen in this post, and I don't think it's interfering with -T, since it says in the bwa manual that -T only affects the output.

Example problematic reads from sam file:

NC_000962.3-1176344   83      NC_000962.3     1788603 0       150M    =       1788555 -198    GTTGTCGGGGCCGCAGGCCAGGGTCAGCTCGGTGATGTCGGTGCGTCCGGTGCTGGTCCAGGCGGTGACGTGGTGGGCTTGGCTGTGGTAGGCCGGTGCGTCACAGCCGGGTTTGGTGCAGCCGCGGTCGTTGGCGAACAGCATGATCCG  GHG</GHA<H/GGGAHG0DHBGHJICHDH1HHIIFIH=BHGHHII@HHIIAGHHGHHHHIGHHIHDIIIH?HHHHHGHHHHH:IIIIHG0HHHGGHHGIGIIGHCIIH@IHHIHIGHI3IHHBBI1HFHF3HHHHHHHCD:CGGG2BCBC  NM:i:0  MD:Z:150        AS:i:150        XS:i:150        RG:Z:TEST        XA:Z:NC_000962.3,+3884609,150M,0;NC_000962.3,-103869,150M,0;
NC_000962.3-1176344   163     NC_000962.3     1788555 0       147M    =       1788603 198     GTGGCCGTGGGTGTTGTTGTGGGTGGTCCAGCCTTTTTCGGCGAGTCGGTTGTCGGGGCCGCAGGCCAGGGTCAGCTCGGTGATGTCGGTGCGTCCGGTGCTGGTCCAGGCGGTGACGTGGTGGGCTTGGCTGTGGTAGGCCGGTGC     2DBBBG?CCCE?HHHHHHHH1HHEHGFIICHI@CBHIF@FHHGHHIIFIIIIHH?@IG.HHHHHIIGH1H0IDFEHDB2@0I1HHAHGHH@DGI/DHG1H/:HDGI1I;@0@=<IHEH?1EFEICFA/.DH/GEC2DG/GH:I/CHH     NM:i:0  MD:Z:147        AS:i:147        XS:i:147        RG:Z:TEST        XA:Z:NC_000962.3,-3884660,147M,0;NC_000962.3,+103821,147M,0;

I appreciate any insights on this!

ADD REPLY
0
Entering edit mode

Out of curiosity do these reads have other alignments that satisfy the -T cut-off elsewhere in the file?

ADD REPLY
0
Entering edit mode

No, these reads don't have other alignments at all in the file.

ADD REPLY
0
Entering edit mode

I have the same issue. I set -T to 75 and I still see reads with AS:i:30, for example

ADD REPLY
0
Entering edit mode

It seems -T parameter acts upon the MD:Z field, which apparently indicates all matches/mismatches/indels (not as straightforward as AS:i). Unmapped reads still remain in the SAM file. In my case, however, when I convert SAM to BAM, sort it and then filter it with $ samtools view -bq 1 sorted.bam > filtered.bam, then I see the min. value for the AS:i field as I set it with -T.

ADD REPLY
1
Entering edit mode

perhaps what is happening is that the mapping quality is set to 0 when the score is under the threshold, so the filtering takes place with the -q 1 step

ADD REPLY

Login before adding your answer.

Traffic: 2274 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6