filter reads in BAM having a tag
3
3
Entering edit mode
4.0 years ago
aebmad ▴ 60

Anyone has a simple solution for filtering reads in a BAM/SAM file having a certain TAG? This came up trying to filter out reads from 10x without a proper CB tag defined (which is causing troubles in downstream analysis tools).

I'm surprised there is no built-in command in samtools for doing this (I opened an issue here). There is the -d TAG:VALUE argument to filter in reads having a specific tag value, but no command to simply filter reads having the tag TAG. I will keep grep-ing my way for the time being, but I wonder if I'm missing a more robust and still simple solution.

alignment samtools • 8.8k views
ADD COMMENT
1
Entering edit mode
4.0 years ago
aebmad ▴ 60

It turns out the samtools dev team was already working on this feature, which will be available in the next release (in the meantime it can be used pulling the develop branch). samtools view -d TAG does the trick.

ADD COMMENT
1
Entering edit mode
4.0 years ago
jkbonfield ★ 1.3k

The samtools view -d TAG in.bam option will fix this for the next release. More generally we are also working on a generic filtering expression language which includes the same capability. Eg `samtools view -e '[SA] && rlen>qlen' would report reads only having an SA tag and where the size of the reference sequence is larger than the query length (so has more deletions than insertions/clips).

Meanwhile, sambamba and samsift both offer this functionality too.

ADD COMMENT
0
Entering edit mode

Thanks for mentioning sambamba and samsift, I will take a look.

ADD REPLY
0
Entering edit mode

Just an update from @jkbonfield comment, I was trying to filter my sample BAM using specific TAG values. Based on this issue on SAMtools Github I could filter using the -e or --expr expression parameter. From the example:

@HD VN:1.4  SO:coordinate
@SQ SN:ref1 LN:10
r1  0   ref1    1   60  10M =   1   0   ACGTACGTAC  aaaaaaaaaa  HD:i:1
r2  0   ref1    1   60  10M =   1   0   ACGTACGTAC  aaaaaaaaaa  HD:i:2
r3  0   ref1    1   60  10M =   1   0   ACGTACGTAC  aaaaaaaaaa  HD:i:3
r4  0   ref1    1   60  10M =   1   0   ACGTACGTAC  aaaaaaaaaa

You can remove reads with specific TAG values using [HD]!=2 and including the reads without it ![HD]

./samtools view -e '![HD] || [HD]!=2' /tmp/e.sam 
r1  0   ref1    1   60  10M =   1   0   ACGTACGTAC  aaaaaaaaaa  HD:i:1
r3  0   ref1    1   60  10M =   1   0   ACGTACGTAC  aaaaaaaaaa  HD:i:3
r4  0   ref1    1   60  10M =   1   0   ACGTACGTAC  aaaaaaaaaa

In my case, I wanted to have only reads without ![vW] and with [vW]==1

./samtools view -e '![vW] || [vW]==1' /tmp/e.sam 
r1  0   ref1    1   60  10M =   1   0   ACGTACGTAC  aaaaaaaaaa  vW:i:1
r4  0   ref1    1   60  10M =   1   0   ACGTACGTAC  aaaaaaaaaa
ADD REPLY
0
Entering edit mode
4.0 years ago

using samjdk see another example here: Filter bam, EXCLUDING specific tagged reads

ADD COMMENT

Login before adding your answer.

Traffic: 2646 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6