Remove all reads failing in ValidateSamFile
2
0
Entering edit mode
6.9 years ago
danny • 0

My bam has a lot of bad reads that cause it to fail the GATK. I would like to remove them. How can I programmatically remove the reads identified by ValidateSamFile as causing errors?

sam bam GATK • 1.8k views
ADD COMMENT
0
Entering edit mode
6.9 years ago

Assuming you are happy to discard the failed reads rather than correcting them, you could set the MAX_OUTPUT option to a large value so to get a list of failed records. If I'm not mistaken you get the record position in the file, like (example from here):

ERROR: Record 1, Read name 20FU...
ERROR: Record 3, Read name 20FU...
ERROR: Record 6, Read name 20GA...

Then pass through the file again and discard the records failing records. This may require writing a little script that parses the output of ValidateSam to get the record numbers to discard (1, 3, 6, ... in the example above) and then read and write the bam file excluding those indexes. (Maybe there is an off-the-shelf tool for all this...)

If you have paired end reads, you may create reads that have no mate which in turn makes the bam file still invalid. I'm not sure if samtools fixmate can fix that.

But again, in practice it may be easier and better to recreate the bam files without broken records in the first place...

ADD COMMENT
0
Entering edit mode
6.9 years ago

using samjdk: http://lindenb.github.io/jvarkit/SamJdk.html

  java -jar  samjdk.jar -e 'List<SAMValidationError> errors = record.isValid(false);return (errors==null || errors.isEmpty());' input.bam

or you can ask GATK to be lenient with errors. I think it's -S LENIENT

ADD COMMENT
0
Entering edit mode

Hi Pierre, this is great - can you give an example of what <samvalidationerrors> is supposed to look like? And can this tool also remove the mate of a read that is failing?

Also, with regards to another question, could one use this tool to remove reads where the read ID occurs more than twice? I have some legacy bams with bad formatting I am trying to work with. Thanks!

ADD REPLY
0
Entering edit mode

an you give an example of what <samvalidationerrors> is supposed to look like?

<samvalidationerrors> is not a placeholder but a concrete java class https://github.com/samtools/htsjdk/blob/master/src/main/java/htsjdk/samtools/SAMValidationError.java

Also, with regards to another question,

ask this as a new question. Search biostars if it was asked before.

ADD REPLY

Login before adding your answer.

Traffic: 1463 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6