Extracting Reads from paired end data in SAM file that map to a list of contig IDs
1
1
Entering edit mode
6.0 years ago
yp19 ▴ 70

I have paired end data in a SAM file and I want to extract all the reads which correspond to specific contig ids. My contig ids are in a text file with one ID per line.

Output of SAM/BAM is preferred.

Thanks for your help.

bam reads sequencing • 2.3k views
ADD COMMENT
1
0
Entering edit mode

would the solution in that link also work for paired data? Thanks!

ADD REPLY
0
Entering edit mode

@max_19 why wouldn't it work for paired data ?

ADD REPLY
0
Entering edit mode

If one read in a pair matches the contig but the other read in that pair does not. Is there a way to include that pair even if only one read matches?

ADD REPLY
2
Entering edit mode
6.0 years ago

ah, I see. Using awk, matching chromosomes RF01 or RF02

/samtools view -h S1.bam | awk -F '\t' 'function fun(C) { return C=="RF01" || C=="RF02";} /^@/ {print;next;} {if(fun($3) || fun($7)) print;}'

or using samjdk: http://lindenb.github.io/jvarkit/SamJdk.html

 java -jar dist/samjdk.jar -e 'Predicate<String> f=C->C.equals("RF01") || C.equals("RF02"); return (!record.getReadUnmappedFlag() && f.test(record.getReferenceName())) || (!record.getMateUnmappedFlag() && f.test(record.getMateReferenceName()));' in.bam
ADD COMMENT
0
Entering edit mode

That does what I need, thank you!

ADD REPLY

Login before adding your answer.

Traffic: 1756 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6