How to extract paired (PE) reads where the single (SE) reads align to different genes from a BAM file generated by Hisat2? - Transcript fusion?
0
0
Entering edit mode
2.1 years ago
chrisk • 0

Hi Everyone,

I am trying to extract paired (PE) reads where the single (SE) reads align to different genes in a BAM file generated by Hisat2.

Though I have seen and used part of a similar question answered on How to extract reads/fragments aligned to intergenic regions from bam/sam file? ,it works slightly differently (intergenic regions).

I have a gene_list with chromosome, start co-ordinate and end coordinate ~ 68000.

In the BAM file I would like to compare the start position ($4) of a read to the start position of the mate ($8).

If either read mate aligns outside of the selected gene coordinates while one read aligns inside those coordinates, I would like to identify and write both of these reads to a new BAM (e.g transcript_fusion.bam), and remove both reads from the original BAM file. I would like to use a for loop iterating through the gene_list to perform this task. How long would such a task approx take on ~ 10M reads ~68K genes?

Or maybe I should do the inverse and write a new BAM selecting only the reads where both mates map within the same gene?

Thank you for any pointers or open source software you may know of which can perform this.

RNA-Seq Alignment Bam • 1.2k views
ADD COMMENT
0
Entering edit mode

My advise would be to use a dedicated fusion gene caller. There are (like in any analysis) caveats and heuristics to be applied compared to a naive approach, and expert software is almost certain to perform better.

ADD REPLY
0
Entering edit mode

Hi ATpoint. Thanks for reading and replying with your advice.

I have been looking online for gene fusion software. I see a lot of Chimeric programs but no programs specific to gene/transcript fusion. Could you recommend any papers or links ?

Thanks again.

ADD REPLY
0
Entering edit mode

I know of Arriba which I saw to be recommended here and on Twitter. It is downstream of STAR.

ADD REPLY
0
Entering edit mode

Thanks again ATpoint. I will investigate Arriba.

I also should have stated that I wish to discard the reads aligned as gene fusion. An unwanted feature in my downstream analysis, which will focus only on PE reads aligning to single genes.

So I just want potential (Hisat2 uniquely mapped) gene fusion reads removed from my bam file.

As Always Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1804 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6