Exclude Reads With Xa Tag
2
0
Entering edit mode
12.2 years ago
fo3c ▴ 450

Hello there,

Does anyone know of a tool to remove reads that have alternative alignments? Similarly to filtering for specific flags, I would like to filter out all reads that have more than one alignment.

I am aware that depending on the aligner the XA tag might not be set. I used bwa, which by default does not set an XA tag if there are more than 3 alternative alignments. However, this should not be a problem since multiple alignments would result in a lower mapping quality (and can thus be removed on this basis).

A simple grep would work, but that could leave me with broken flags (filter out one read in a pair and not the other without adjusting the flag).

Any suggestions?

filtering samtools sam bam bwa • 8.3k views
ADD COMMENT
0
Entering edit mode

I don't remember if BWA reports it, but the NH tag reports the total hits, so NH:i:1 means unique hits.

ADD REPLY
4
Entering edit mode
12.2 years ago
matted 7.8k

It's a bit ugly as well, but for these types of tasks I usually identify the read names I want to filter via grep, then use grep again to remove both read pairs that match the name. This assumes that paired reads have the same name, which I think is usually the case.

For example:

samtools view in.bam | fgrep XA | cut -f 1 > bad_names.txt
samtools view -h in.bam | fgrep -vf bad_names.txt | samtools view -Sb - > out.bam

This would filter out a complete pair if either (or both) read has an XA tag, and leave the BAM consistent. If you'd prefer to keep a read whose mate has an XA tag but fix the flags, I think you'd have to do it yourself with something like pysam or Bio-Samtools (or just hack the flags and text on your own).

ADD COMMENT
1
Entering edit mode
12.2 years ago

The following java program will filter a BAM using javascript (rhino) and the picard library. It reads a BAM file/stdin and applies the javascript script to accept/reject each SamRecord. The samRecord is injected as "record" and the SamFileHeader is injected as "header" in the js context. To answer your question you can use the script:

record.getAttribute("XA")==null;

The java program, how to compile and a test file :

/**
* Author: Pierre Lindenbaum PhD. @yokofakun
* http://www.biostars.org/p/66319/
*/
import java.io.File;
import java.io.FileInputStream;
import java.io.FileReader;
import java.io.InputStream;
import java.util.Iterator;
import javax.script.Bindings;
import javax.script.Compilable;
import javax.script.CompiledScript;
import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;
import net.sf.samtools.SAMFileHeader;
import net.sf.samtools.SAMFileReader;
import net.sf.samtools.SAMFileReader.ValidationStringency;
import net.sf.samtools.SAMFileWriter;
import net.sf.samtools.SAMFileWriterFactory;
import net.sf.samtools.SAMRecord;
public class Biostar63016
{
private CompiledScript script=null;
private ScriptEngine engine=null;
private Biostar63016()
{
}
private void scan(InputStream in) throws Exception
{
SAMFileReader samFileReader=new SAMFileReader(in);
samFileReader.setValidationStringency(ValidationStringency.SILENT);
SAMFileHeader header=samFileReader.getFileHeader();
SAMFileWriterFactory sf=new SAMFileWriterFactory();
sf.setCreateIndex(false);
SAMFileWriter sw=sf.makeSAMWriter(header,false,System.out);
Bindings bindings = this.engine.createBindings();
bindings.put("header", header);
for(Iterator<SAMRecord> iter=samFileReader.iterator();
iter.hasNext(); )
{
SAMRecord record=iter.next();
bindings.put("record", record);
Object result = script.eval(bindings);
if(result==null) continue;
if(result instanceof Boolean)
{
if(Boolean.FALSE.equals(result)) continue;
}
else if(result instanceof Integer)
{
if(((Integer)result).intValue()!=1) continue;
}
else
{
continue;
}
sw.addAlignment(record);
}
samFileReader.close();
sw.close();
}
private void run(String[] args)
throws Exception
{
String scriptStr=null;
File scriptFile=null;
int optind=0;
while(optind< args.length)
{
if(args[optind].equals("-h") ||
args[optind].equals("-help") ||
args[optind].equals("--help"))
{
System.err.println("Pierre Lindenbaum PhD. 2013");
System.err.println("Options:");
System.err.println(" -h help; This screen.");
System.err.println(" -e (script).");
System.err.println(" -f (file).");
System.err.println("the script puts 'record' a SamRecord (http://picard.sourceforge.net/javadoc/net/sf/samtools/SAMRecord.html) " +
" and 'header' ( http://picard.sourceforge.net/javadoc/net/sf/samtools/SAMFileHeader.html) in the script context .");
return;
}
else if(args[optind].equals("-e") && optind+1< args.length)
{
scriptStr=args[++optind];
}
else if(args[optind].equals("-f") && optind+1< args.length)
{
scriptFile=new File(args[++optind]);
}
else if(args[optind].equals("--"))
{
optind++;
break;
}
else if(args[optind].startsWith("-"))
{
System.err.println("Unknown option "+args[optind]);
return;
}
else
{
break;
}
++optind;
}
if(scriptStr==null && scriptFile==null)
{
System.err.println("undefined script");
System.exit(-1);
}
ScriptEngineManager manager = new ScriptEngineManager();
this.engine = manager.getEngineByName("js");
if(this.engine==null)
{
System.err.println("not available: javascript. Use the SUN/Oracle JDK ?");
System.exit(-1);
}
Compilable compilingEngine = (Compilable)this.engine;
this.script = null;
if(scriptFile!=null)
{
FileReader r=new FileReader(scriptFile);
this.script=compilingEngine.compile(r);
r.close();
}
else
{
this.script=compilingEngine.compile(scriptStr);
}
if(optind==args.length)
{
scan(System.in);
}
else if(optind+1==args.length)
{
FileInputStream fin=new FileInputStream(args[optind]);
scan(fin);
fin.close();
}
else
{
System.err.println("illegal number of arguments.");
System.exit(-1);
}
}
public static void main(String[] args) throws Exception
{
new Biostar63016().run(args);
}
}
view raw gistfile1.txt hosted with ❤ by GitHub
record.referenceName=="seq2" &&
record.getAttribute("H0")==null;
view raw test.js hosted with ❤ by GitHub
javac -cp /path/to/picard-1.62.jar:/path/to/sam-1.62.jar Biostar63016.java
java -cp /path/to/picard-1.62.jar:/path/to/sam-1.62.jar:. Biostar63016 \
-f script.js /path/to/ex1.bam
view raw test.txt hosted with ❤ by GitHub

ADD COMMENT

Login before adding your answer.

Traffic: 1720 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6