Hello,
I have a bam file for the samples of RNA-Seq data. I need to get read counts for these to perform differential expression analysis. Which tools should I use for this and how?
Thanq
Hello,
I have a bam file for the samples of RNA-Seq data. I need to get read counts for these to perform differential expression analysis. Which tools should I use for this and how?
Thanq
featureCounts is the easiest option.
...as well as multithreaded and tailored to RNA-seq. Go with featureCounts. You will need a GFF/GTF file that fits your organism/the assembly you aligned to and the BAM files.
### Assign reads to exons based on the GTF annotation:
# --a = the GTF/GFF file
# --F = specify the format of -a
# --p = data are paired-end (unset it if you have single-end)
# --T = set number of threads
# -P = only consider pairs with ISIZE defined by -d & -D, default 50-600bp
# -o = output file
$FCOUNTS -a $GTF -F GTF -p -T 8 -P -o countMatrix_all.txt *.bam
Sorted BAMs are not necessary, FC will sort internally if you have paired-end data (still, sorting a file is as easy as samtools sort -o sorted.bam unsorted.bam
). I edited my post in that regard. For the checking, simply use samtools view -f 1 in.bam | head
and then check if some reads pop up. -f 1 means only keep paired reads. If there are none, your data are not PE. Alternatively, use samtools view -H in.bam
to get the header of the file, and then check the @PG line at the bottom where the alignment command is saved. If you need help, please post the output of both commands, so we can have a look.
I actually gave it like this samtools view -H file.bam
I see this
@HD VN:1.4 SO:coordinate GO:none
@SQ SN:NR_039978 LN:984
@SQ SN:NM_130786 LN:1766
@SQ SN:NM_138932 LN:9293
@SQ SN:NM_033110 LN:2678
@SQ SN:NM_000014 LN:4678
@SQ SN:NM_144670 LN:5192
@SQ SN:NR_040112 LN:1201
I dont see the gene which I need in the differential expressed genes list.
While we like to get an expected result for our experiments, biology does not always work that way. Perhaps the difference is subtle (or variation is too high) and is not supported by number of samples/replicates etc.
I use StringTie... Another option is Cufflinks
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
There are many tools for that such as featureCounts, htseq, bedtools (multicov) etc... Also they are well documented.