Filter Reads In The Sam/Bam Files By Direction Using Samtools
2
1
Entering edit mode
11.3 years ago
Shaohua ▴ 20

Hi all,

I am wondering whether there is a way to filter the reads in the sam/bam files by the alignment direction using samtools or other tools? for example, I want to filter out all the reads facing to each other (-> <-) in the mapping result of a mate pair library.

Thanks a lot!

Shaohua

samtools • 8.5k views
ADD COMMENT
0
Entering edit mode

you mean, face each other and they are pairs ? Is your bam file sorted by read name ?

ADD REPLY
0
Entering edit mode

yes, since in principle, the reads in the mate pair library should be <- ->. However, some paired end reads with direction of -> <- were sequenced. For my research, I just want to use the 'true mate pair reads'. I want to filter out those reads are PE reads based on their direction

ADD REPLY
0
Entering edit mode

have they been sorted on a per name basis ?

ADD REPLY
0
Entering edit mode

I am not sure what you mean here. I used bwa sampe for the mapping and the results from bwa were first converted by bam format and sorted by default parameters of samtools

ADD REPLY
0
Entering edit mode

Well you cannot rely on the flags, that we know. not if you do samtools sort on read names, not coordinate, you will have every pair with its mate immediately after it. Then you can figure out which are within reasonable distance plus opposite strands.

ADD REPLY
0
Entering edit mode

There's no need to resort for that. Just check the flag (for orientation), rnext (to ensure the same chromosome) and pnext (mapping position) fields. This would be trivial to script.

ADD REPLY
0
Entering edit mode

How do you get the orientation of the mate ?

ADD REPLY
0
Entering edit mode

The 0x20 flag status.

ADD REPLY
0
Entering edit mode

yes ! forgot that one !

ADD REPLY
0
Entering edit mode

@Shaohua can you script in pysam or code in bamtools ?

ADD REPLY
0
Entering edit mode

If the aligner that you used sets the "properly paired" flag correctly, then just filter by that "samtools -f 2 ...".

ADD REPLY
0
Entering edit mode

that depends on the size of the insert in between. I do not know how an aligner will do insert estimate for a mate pair lib.

ADD REPLY
0
Entering edit mode

yes, it is quite tricky to estimate the insertion size, for some libraries, the insertion size estimated by bwa is around 3 kb which fits our expectation. However, for some libraries, the range of the insertion size is quite big, from few hundreds bp to 5 kb. This may indicate the mate pair library has lots of PE reads.

ADD REPLY
0
Entering edit mode
11.3 years ago

The following program is an example of an efficient approach to use, using the samtools C API. The program itself is just a quick hack, but you can use it to get started. The program takes a SAM or BAM input and will write accepted reads to a file called "filtered.bam". Reads are accepted according to the following criterion: (1) a read and its mate must map to the same chromosome/contig, (2) the mates must have opposite orientations, and (3) the mates must point away from each other. I've added some comments, in case you're not very familiar with the API or C. You'd want to make this more robust if you were to actually use it!

#include "sam.h"

int main(int argc, char* argv[]) {
    samfile_t *ifile = NULL, *ofile = NULL;
    bam1_t *read = bam_init1();
    int keep = 0;
    char *p = NULL;

    //Open input file, either SAM or BAM
    p = strrchr(argv[1], '.');
    if(strcmp(p, ".bam") == 0) {
        ifile = samopen(argv[1], "rb", NULL);
    } else {
        ifile = samopen(argv[1], "r", NULL);
    }

    //Open output file
    ofile = samopen("filtered.bam", "wb", ifile->header);

    //Iterate through the lines
    while(samread(ifile, read) > 1) {
        keep = 0;
        //Is the read's mate on the same chromosome/contig?
        if(read->core.tid == read->core.mtid) {
            //Are the mates on opposite strands?
            if(read->core.flag & BAM_FREVERSE && !(read->core.flag & BAM_FMREVERSE)) {
                if(read->core.pos < read->core.mpos) keep=1;
            } else if(!(read->core.flag & BAM_FREVERSE) && read->core.flag & BAM_FMREVERSE) {
                if(read->core.mpos < read->core.pos) keep=1;
            }
        }
        if(keep) samwrite(ofile, read);
    }
    bam_destroy1(read);
    samclose(ifile);
    samclose(ofile);
    return 0;
}
ADD COMMENT
0
Entering edit mode

Thank you for the script. I think this will serve my purpose to filter the reads from bam file. But I am not able to run the script. Can you tell me what are the command line arguments we need to pass to run this program.

thank you

ADD REPLY
0
Entering edit mode

The only argument is a BAM file. It writes a new file called "filtered.bam".

ADD REPLY
0
Entering edit mode
11.3 years ago

using my tool for filtering a SAM/BAM with javascript: https://github.com/lindenb/jvarkit#samjs-filtering-a-sambam-file-with-javascript

 java -jar dist/samjs.jar \
   SE='(record.referenceIndex==record.mateReferenceIndex && record.referenceIndex>=0 && record.readNegativeStrandFlag!=record.mateNegativeStrandFlag &&  ((record.mateNegativeStrandFlag && record.alignmentStart < record.mateAlignmentStart ) || (record.readNegativeStrandFlag && record.mateAlignmentStart < record.alignmentStart ) ))' \
    I=input.bam \
    SAM_OUTPUT=true \
    VALIDATION_STRINGENCY=SILENT
ADD COMMENT

Login before adding your answer.

Traffic: 2395 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6