Extract overlapping sequences from a bam file
0
0
Entering edit mode
3.0 years ago
rubic ▴ 270

Might be something that's already posted and solved, but I couldn't find it searching online so my apologies if I missed it.

I have an RNA-seq bam file and I want to extract sequences from regions interesting certain intervals.

Here's an example.

This is a read mapping from the bam file (it's from mapping PacBio reads with minimap2):

molecule/55 0   chr6    41543367    60  821S44M3319N1375M1D246M1S   *   0   0   GGGAGCCCACAGTAGGGTATTGTCTGTGGAACCAGATATAGTGATGGCTCCGGAGACCCAGAACAGAGCAGGGAATGACATCATATTTGCAAAACCACTGTGCATCAGCAAAGGTGGCTTTAAACTGTAGTTTTGGCAATCTAAAACCATCTAGAGCTAGGAAAAGGGAGGAAAGCAACTTCCTCAGAGAAAAACATGAAGGGGATTTCCATCCCCAGCCCTATTTATTCAGTACAGTCATCTCTCCATCCACTAAGCAGGCTGAGCCTAAAGCACAGAAGGGCATAGCCAACACTGTATACCCAAGAGTACAGAGAGCCCATGGGAGTCAGCAAGATCGCTGACCTCAGAACCTGACATCATAGGCAGTAAGATGAAAGGAGGAAAGAGAAAGGAGGAGCCGACTCTCACTTTCTCACCAAGAGGACCAGCATCCTGAGAAGAAGCATGTCTAACACTGTCCTCGCTGATTCTGCCTGGGGCATCACCCTGCTATCTTGGGTTACTGTCTTTCTCTTGGGAACAAGTTCAGCAGATTCTGGGGTTGTCCAGTCTCCAAGACACATAATCAAAGAAAAGGGAGGAAGGTCCGTTCTGACGTGTATTCCCATCTCTGGACATAGCAATGTGGTCTGGTACCAGCAGACTCTGGGGAAGGAATTAAAGTTCCTTATTCAGCATTATGAAAAGGTGGAGAGAGACAAAGGATTCCTACCCAGCAGATTCTCAGTCCAACAGTTTGATGACTATCACTCTGAAATGAACATGAGTGCCTTGGAACTGGAGGACTCTGCTATGTACTTCTGTGCCAGCTCTAGGAGAAACACCTTGTACTTTGGTGCGGGCACCCGACTATCGGTGCTAGAGGATCTGAGAAATGTGACTCCACCCAAGGTCTCCTTGTTTGAGCCATCAAAAGCAGAGATTGCAAACAAACAAAAGGCTACCCTCGTGTGCTTGGCCAGGGGCTTCTTCCCTGACCACGTGGAGCTGAGCTGGTGGGTGAATGGCAAGGAGGTCCACAGTGGGGTCAGCACGGACCCTCAGGCCTACAAGGAGAGCAATTATAGCTACTGCCTGAGCAGCCGCCTGAGGGTCTCTGCTACCTTCTGGCACAATCCTCGAAACCACTTCCGCTGCCAAGTGCAGTTCCATGGGCTTTCAGAGGAGGACAAGTGGCCAGAGGGCTCACCCAAACCTGTCACACAGAACATCAGTGCAGAGGCCTGGGGCCGAGCAGGTAAGTGCGGAGCTCATGAGGAAAGTAAACAGCACTAGTACTTCAAAAAATATGAAACAATCCATGTAGAAGTGAAGAATAGACCCAGGAAAAGGCCAGAGTGGTGGGACAGATGATCAAGCTCATGGTGTCAGAAAACCATAGCCTATGCTTCCTCCCAAGGAGTATGTATGTAAACTCAGTGGGGCAGCTCAGGCCAATTGGCTTCCCAGTTCTTTAGTGTCTCAGAGCTGTGCTTAAGAGTTCTCCCATACCTCCCTCACAACCTAGCATCGCTCATCCCCATCCCTGCTGCTAGATTTTTAAGGTCACTCTGACAGTGTCTTATAACTTTCCCAGCACCATCAGAAAGACAGTGTGAGGACTATAAGAGGAGAGTGCTTACACCATCTGACATGCATGTGTCTGTGGCCTTTACATTCGGCTTTAAGTTTTGTTGTTGGGTGTTCAATGTCCCCCAAAGTGGCTTTCTTTCACCCAATTTTCTCCCTCTCCTTTCTTTCAGACTGTGGAATCACTTCAGGTGAGTAGATCTTCCGACTTTCTCTACGTCTTCTGTGGTCTTGGCTTTGAAAACAGGACACAAATATCCTATAGACATGAGGGTCGGGGCTGCCCAGAGGAACTAGCTCACAAACACCTACTAACTCTCCTTTCCTGTCAACAGCATCCTATCATCAGGGGGTTCTGTCTGCAACCATCCTCTATGAGATCCTACTGGGGAAGGCCACCCTATATGCTGTGCTGGTCAGTGGCCTGGTGCTGATGGCCATGGTAAGAATGGTAGGATGGACAAATGGTTGGAGGTATAGACTTCAGTGTATGGATATAAAGGGATCTCAGAGGAGGACCCAGCCCTGATATCCTGCCATTTCAAAAAAGACCATAAAAAACACAGTGCAAAGCAAAAACACAGGAATGCTTATGTTTGTACTCCTGGAAGATGAAGAGAACCAAGGAGCTCTCTTCAAGATCATACTTGAAAATCTCCTTTTTGTATCCCTTCCCTCTGCTCCATGGATTCTGGAGGTCTAACAATGTCTTCTCTTTCTCAGGTCAAGAAAAAAAATTCCTGAGACAAACTTTTATGCATCCTGAGCCGTTCTTCACCCTGGCCATAGATTTTCCTGCACCTTCTCTAATTCCTGTTCCTAAGAACTTGTCTCTTCTTCCTCCATGGATATCCATCCTTCCTCGTTGACACCTTGACTCTGAAACAGACTAAATCAATAAAAACATGGAGTTAG *   NM:i:1  ms:i:1662   AS:i:1630   nn:i:0  ts:A:+  tp:A:P  cm:i:547    s1:i:1648   s2:i:415    de:f:0.0006 SA:Z:chr6,41113120,+,1S815M130D1671S,60,1;  rl:i:0

And here's the interval I'm intersecting it with:

chr6    41546730    41548352    Trbc2   100 +

My understanding according to the cigar field is that the intersection between the read mapping and interval is in the second portion of the read.

I ran bedtools:

intersectBed -bed -wb -split -abam <reads_bam_file> -b <intervals_bed_file>

Which gives:

chr6    41546730    41548351    molecule/55 60  +   41543366    41548351    0,0,0   2   44,1622,    0,3363,

So my question is how to extract the read sequence part that corresponds to the intersection

chr6    41546730    41548351
bam RNA-Seq split splicing cigar • 475 views
ADD COMMENT

Login before adding your answer.

Traffic: 2316 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6