Question

How To Extract Spliced Rnaseq Reads

0

Entering edit mode

11.4 years ago

Chirag Nepal ★ 2.4k

Hey all,

From acceptedhits.bam, I want to count only those reads that are spliced across two exons. How do we extract such information from BAM file?

I think one way would be: use "split" option in coveragebed from bedtools. Though I am not 100% sure.

cheers
Chirag

RNA-seq • 12k views

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 11.4 years ago by Chirag Nepal ★ 2.4k

Ram · Answer 1 · 2013-12-26

samtools view acceptedhits.bam   |  awk '($6 ~ /N/)' | cut -f1

will give you the read ids of the spliced reads.

'N' tag in BAM format represents skipped region from the reference. So if a read doesn't have a continuous alignment or a large reference region is skipped from the alignment then that portion of reference genome will be depicted in BAM file using 'N' tag in the sixth column. This won't guarantee you that both the portions of the reads are aligned on exons. They can be covering exon-intron, intron-exon or intron-intron regions too. But as it is a RNA-seq read most probably they should be exon-exon. It would be pretty easy to check it based on GTF annotation file you have.

Ram · Answer 2 · 2013-12-26

2

Entering edit mode

11.4 years ago

Charles Warden 8.3k

There are lots of ways to do this. I think the easiest solution is to use knowledge of known exon junctions.

1). Providing TopHat with a .gtf file should produce a junctions output file (technically, it already produces this file, but I think it should be empty unless you provide a reference list of transcript locations)

2) Use a software to predict splicing events. This will give you a prioritized list (in addition to providing counts for all relevant splicing junctions). MATS is my favorite tool for this, and MISO is another popular option.

MATS: http://rnaseq-mats.sourceforge.net/

MISO: http://genes.mit.edu/burgelab/miso/

3) If it is relevant, there are also gene fusions programs. I have a slight preference for chimerascan, but I have tried all of the following programs:

TopHat-fusion: http://tophat.cbcb.umd.edu/fusion_tutorial.html

chimerascan: https://code.google.com/p/chimerascan/

deFuse: http://compbio.bccrc.ca/software/defuse/

I'm sure there are also options for generically splicing the .bed file for split reads, but I would typcially focus on looking for software that also assists with the downstream analysis (for whatever specific application I am interested in). So, I don't really have recommendations on this end.

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 11.4 years ago by Charles Warden 8.3k

0

Entering edit mode

After RNA-seq alignment using tophat or STAR, only one bam file will be outputted. But the MATS need to input two bam files, is it a must to separate the bam file into two bam files according the first or second reads? Is there any other good methods to deal with this problem?

ADD REPLY • link updated 2.4 years ago by Ram 45k • written 9.9 years ago by zju.whw ▴ 70

1

Entering edit mode

MATS detects differential alternative splicing events between two conditions. For example, before and after treatment or normal vs tumor samples etc. That is why it requires two bam files (one for each condition). It has nothing to do with first and second read.

ADD REPLY • link updated 2.4 years ago by Ram 45k • written 9.9 years ago by Ashutosh Pandey 12k

0

Entering edit mode

Thank you very much. Your comment is very useful and helpful. I took the "sample_1" and "sample_2" as "read_1" and "read2" by mistake.

ADD REPLY • link updated 2.4 years ago by Ram 45k • written 9.9 years ago by zju.whw ▴ 70