How do I filter out reads where Read 2 doesn't map to the end of a fasta sequence
2
0
Entering edit mode
2.3 years ago
cag104 ▴ 10

For simplicity let us say I have one sequence that is 400bp in length. I used STAR to map reads to this sequence and now I am interested in only keeping the read pairs where read 2 maps to the end of the sequence (position ~440-450 of 450). What is the best way to do this? I've been trying to see if its possible using PySAM but am having some issues.

Thanks!

rna fasta • 656 views
ADD COMMENT
0
Entering edit mode
2.3 years ago

I'd do this in two steps. 1, as above, use samtools with a bed filter to get the reads where read2 do map to the region you want.

Then pull out the read names of those reads. Then use picard filterSamReads to get the read1 and read2 data belonging to those reads.

ADD COMMENT
0
Entering edit mode
2.2 years ago

I'd definitely do this with pysam. What issues are you having? I'd do something like:

import pysam
inbam = pysam.AlignmentFile("my.bam")
outbam = pysam.AlignmentFile("my.filtered.bam", "w", template=inbam)

for read in inbam.fetch("sequence_name"):

    if read.reference_end >= 440:
        outbam.write(read)

outbam.close()
ADD COMMENT

Login before adding your answer.

Traffic: 1758 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6