How To Extract Spliced Rnaseq Reads
2
0
Entering edit mode
11.0 years ago
Chirag Nepal ★ 2.4k

Hey all,

From acceptedhits.bam, I want to count only those reads that are spliced across two exons. How do we extract such information from BAM file?

I think one way would be: use "split" option in coveragebed from bedtools. Though I am not 100% sure.

cheers
Chirag

RNA-seq • 11k views
ADD COMMENT
4
Entering edit mode
11.0 years ago
samtools view acceptedhits.bam   |  awk '($6 ~ /N/)' | cut -f1

will give you the read ids of the spliced reads.

'N' tag in BAM format represents skipped region from the reference. So if a read doesn't have a continuous alignment or a large reference region is skipped from the alignment then that portion of reference genome will be depicted in BAM file using 'N' tag in the sixth column. This won't guarantee you that both the portions of the reads are aligned on exons. They can be covering exon-intron, intron-exon or intron-intron regions too. But as it is a RNA-seq read most probably they should be exon-exon. It would be pretty easy to check it based on GTF annotation file you have.

ADD COMMENT
0
Entering edit mode

Is it possible to count spliced read over intron from intron.bed?

ADD REPLY
2
Entering edit mode
11.0 years ago

There are lots of ways to do this. I think the easiest solution is to use knowledge of known exon junctions.

1). Providing TopHat with a .gtf file should produce a junctions output file (technically, it already produces this file, but I think it should be empty unless you provide a reference list of transcript locations)

2) Use a software to predict splicing events. This will give you a prioritized list (in addition to providing counts for all relevant splicing junctions). MATS is my favorite tool for this, and MISO is another popular option.

MATS: http://rnaseq-mats.sourceforge.net/

MISO: http://genes.mit.edu/burgelab/miso/

3) If it is relevant, there are also gene fusions programs. I have a slight preference for chimerascan, but I have tried all of the following programs:

TopHat-fusion: http://tophat.cbcb.umd.edu/fusion_tutorial.html

chimerascan: https://code.google.com/p/chimerascan/

deFuse: http://compbio.bccrc.ca/software/defuse/

I'm sure there are also options for generically splicing the .bed file for split reads, but I would typcially focus on looking for software that also assists with the downstream analysis (for whatever specific application I am interested in). So, I don't really have recommendations on this end.

ADD COMMENT
0
Entering edit mode

After RNA-seq alignment using tophat or STAR, only one bam file will be outputted. But the MATS need to input two bam files, is it a must to separate the bam file into two bam files according the first or second reads? Is there any other good methods to deal with this problem?

ADD REPLY
1
Entering edit mode

MATS detects differential alternative splicing events between two conditions. For example, before and after treatment or normal vs tumor samples etc. That is why it requires two bam files (one for each condition). It has nothing to do with first and second read.

ADD REPLY
0
Entering edit mode

Thank you very much. Your comment is very useful and helpful. I took the "sample_1" and "sample_2" as "read_1" and "read2" by mistake.

ADD REPLY

Login before adding your answer.

Traffic: 1901 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6