Bam to read counts for differential expression analysis
2
1
Entering edit mode
7.0 years ago
Vasu ▴ 790

Hello,

I have a bam file for the samples of RNA-Seq data. I need to get read counts for these to perform differential expression analysis. Which tools should I use for this and how?

Thanq

RNA-Seq bam expression analysis • 8.7k views
ADD COMMENT
0
Entering edit mode

There are many tools for that such as featureCounts, htseq, bedtools (multicov) etc... Also they are well documented.

ADD REPLY
3
Entering edit mode
7.0 years ago
GenoMax 147k

featureCounts is the easiest option.

ADD COMMENT
1
Entering edit mode

...as well as multithreaded and tailored to RNA-seq. Go with featureCounts. You will need a GFF/GTF file that fits your organism/the assembly you aligned to and the BAM files.

### Assign reads to exons based on the GTF annotation:
# --a = the GTF/GFF file
# --F = specify the format of -a
# --p = data are paired-end (unset it if you have single-end)
# --T = set number of threads
# -P  = only consider pairs with ISIZE defined by -d & -D, default 50-600bp
# -o  = output file
$FCOUNTS -a $GTF -F GTF -p -T 8 -P -o countMatrix_all.txt *.bam
ADD REPLY
0
Entering edit mode

the bam file I have is not sorted one. And I'm also not sure whether this is paired end or single end. Any idea how to check this and sort the bam file before getting the counts data.

ADD REPLY
0
Entering edit mode

Sorted BAMs are not necessary, FC will sort internally if you have paired-end data (still, sorting a file is as easy as samtools sort -o sorted.bam unsorted.bam). I edited my post in that regard. For the checking, simply use samtools view -f 1 in.bam | head and then check if some reads pop up. -f 1 means only keep paired reads. If there are none, your data are not PE. Alternatively, use samtools view -H in.bam to get the header of the file, and then check the @PG line at the bottom where the alignment command is saved. If you need help, please post the output of both commands, so we can have a look.

ADD REPLY
0
Entering edit mode

It is single end. After giving the command [samtools view -f 1 in.bam | head] I didnt see anything pop up. This means no need to give -p in the feature counts command. Am I right? And may I know from where I can get the gtf file? [hg19 genome]

ADD REPLY
0
Entering edit mode

No need for -p. You need to get a GTF file that matches your genome (important because UCSC uses chr1 where as Ensembl just 1). What kind of chromosome names do you have?

ADD REPLY
0
Entering edit mode

I actually gave it like this samtools view -H file.bam

I see this

@HD VN:1.4  SO:coordinate   GO:none
@SQ SN:NR_039978    LN:984
@SQ SN:NM_130786    LN:1766
@SQ SN:NM_138932    LN:9293
@SQ SN:NM_033110    LN:2678
@SQ SN:NM_000014    LN:4678
@SQ SN:NM_144670    LN:5192
@SQ SN:NR_040112    LN:1201
ADD REPLY
0
Entering edit mode

You appear to have aligned your data to a set of mRNA. This is going to cause problems for counting since you are going to have to make a GTF file up yourself for these. Why did you do this instead of aligning against the genome?

ADD REPLY
0
Entering edit mode

The bam files are from IonTorrent server

ADD REPLY
0
Entering edit mode

Then Torrent Server should have also produced a file with counts.

ADD REPLY
0
Entering edit mode

Ok. So I see an option - Download absolute reads table [This is read counts data] Am I right?

ADD REPLY
0
Entering edit mode

Likely. I have not used Torrent Server recently but should be easy to check. Make sure the reads counts are raw (not normalized), if you intend to use them with DESeq2/edgeR.

ADD REPLY
0
Entering edit mode

Yes, I see both. Absolute reads table and normalized table. I got the absolute reads table ans using edgeR for DE analysis now.

ADD REPLY
0
Entering edit mode

Is there any other way to get the counts from bam files. When I did the DE analysis with the absolute reads which I got from Ion torrent server with edgeR, I dont see the gene which I need in the differential expressed genes list.

ADD REPLY
0
Entering edit mode

I dont see the gene which I need in the differential expressed genes list.

While we like to get an expected result for our experiments, biology does not always work that way. Perhaps the difference is subtle (or variation is too high) and is not supported by number of samples/replicates etc.

ADD REPLY
0
Entering edit mode

Ok. Could you please check this link [http://10.16.32.45/output/Home/Re-analysis_full_report_2_108/plugin_out/RNASeqAnalysis_out.144/RNASeqAnalysis.html]. Go down and you see - Download absolute reads table. Could you please check whether they are read counts or not.

ADD REPLY
0
Entering edit mode
7.0 years ago

I use StringTie... Another option is Cufflinks

ADD COMMENT

Login before adding your answer.

Traffic: 2152 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6