Is there a Picard Tool to filter one or more reads through the following :
- FLAG (second column)
- Mapping Quality (MAPQ 5th Column)
- TLEN (9th Column)
??
Is there a Picard Tool to filter one or more reads through the following :
??
Hypothetical scenario: Suppose you'd want to get all reads with flag 83=paired reads|first in pair|read reverse mapped| proper pair
reads with mapping quality > 30 from a bam file and return output to a bam file:
samtools view -h input.bam | awk 'substr($0,1,1) == "@" || $2 == 83 || $5 >= 30 {print}' | samtools view -bS - > output.bam
How it works?:
samtools view -h input.bam - decodes bam and prints sam file with header to screen
awk 'substr($0,1,1) == "@" - keep all header lines
|| $2 == 83 || $5 >= 30 {print} - the actual condition you would want to check. for unmapped reads you could use GAWK with bitwise operation and($2, 4) == 4 || and($2, 16) == 16, or something like that.
samtools view -bS - pipe input S=sam to output b=bam and the "-" mediates samtools to take input from the screen
> output.bam - simple unix redirection to output file
Basically, you can use the awk line to filter anything you want.
You can use BamTools with its filter utility. You need to write a .json file like this one:
{
"isMapped" : "true",
"mapQuality" : ">50",
"isPaired" : "true"
}
And run command
bamtools filter -in in.bam -out out.bam -script filter.json
See a tutorial here.
I don't see a way to extract TLEN with it, though.
How can I filter on a string in a "subtag" ?
I have an annotated BAM file (from CRAC) with an 'XE' tag. For example, the tags are:
XE:Z:0:0:Error:Unknown:1:15.0049
or
XE:Z:0:0:chimera:32:12|-1,35194979:12|-1,34872998
I would like to get only the chimeras but I can't figure out how to use 'bamtools filter' correctly (in order to be quicker than a 'grep').
Anyone knows how to do ?
Cheers
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
(+1) haven't seen that. thanks.
Will try it out. Thanks a lot :)