Hi Everyone,
I am trying to extract paired (PE) reads where the single (SE) reads align to different genes in a BAM file generated by Hisat2.
Though I have seen and used part of a similar question answered on How to extract reads/fragments aligned to intergenic regions from bam/sam file? ,it works slightly differently (intergenic regions).
I have a gene_list with chromosome, start co-ordinate and end coordinate ~ 68000.
In the BAM file I would like to compare the start position ($4) of a read to the start position of the mate ($8).
If either read mate aligns outside of the selected gene coordinates while one read aligns inside those coordinates, I would like to identify and write both of these reads to a new BAM (e.g transcript_fusion.bam), and remove both reads from the original BAM file. I would like to use a for loop iterating through the gene_list to perform this task. How long would such a task approx take on ~ 10M reads ~68K genes?
Or maybe I should do the inverse and write a new BAM selecting only the reads where both mates map within the same gene?
Thank you for any pointers or open source software you may know of which can perform this.
My advise would be to use a dedicated fusion gene caller. There are (like in any analysis) caveats and heuristics to be applied compared to a naive approach, and expert software is almost certain to perform better.
Hi ATpoint. Thanks for reading and replying with your advice.
I have been looking online for gene fusion software. I see a lot of Chimeric programs but no programs specific to gene/transcript fusion. Could you recommend any papers or links ?
Thanks again.
I know of Arriba which I saw to be recommended here and on Twitter. It is downstream of STAR.
Thanks again ATpoint. I will investigate Arriba.
I also should have stated that I wish to discard the reads aligned as gene fusion. An unwanted feature in my downstream analysis, which will focus only on PE reads aligning to single genes.
So I just want potential (Hisat2 uniquely mapped) gene fusion reads removed from my bam file.
As Always Thanks!