Hi,
I have a RNA-seq reads mapped onto genome using tophat2 and obtained a "bam" file which is later converted into a BED file. I extracted only the relevant columns from bed file which includes the Scaffold, start, end, orientation and CIGAR. For spliced read liek following
scaffold8075 66972 68644 - 48M22N2M
The CIGAR string says that the first 48 bp of read matches to genome, then 22 bp of introns and later the last 2 bp of the read matches. Now I would split the above read based on matched regions that matches like following
scaffold8075 66972 67020 - 48M22N2M
scaffold8075 68642 68644 - 48M22N2M
In simple words, I would like split the spliced reads based on its exon-exon alignment. Any guidance would be appreciated. Thanks in advance.
Have you read the options of bedtools bamtobed? Looks like
-split
is what you need.