Hi everyone, I am running cufflinks and cuffdiff to get differential expressed genes for two cells. But after cuffdiff, the output from gene.exp.diff is weird. There are several gene ids for the same gene and they have the same locus. I tried to use two different annotation genome, both hg19 and the one from Ensembl. Here is an example of my output.
XLOC_023143 XLOC_023143 NANOG 12:7940389-7949298 H9_0 H9_2 OK 4.17005 6.76886
XLOC_026123 XLOC_026123 NANOG 12:7940389-7949298 H9_0 H9_2 OK 1.30734 1.4116
XLOC_026124 XLOC_026124 NANOG 12:7940389-7949298 H9_0 H9_2 OK 43.3278 17.506
XLOC_026125 XLOC_026125 NANOG 12:7940389-7949298 H9_0 H9_2 OK 22.7351 14.8569
Here is my scripts. cufflinks
module load anaconda/2.2.0 boost/1.47.0/gcc447 samtools/0.1.19 cufflinks/2.2.1
time cufflinks -p 8 --library-type fr-firststrand -o H9_0_1_clout ./H9_0_1/accepted_hits.bam
cuffdiff
module load anaconda/2.2.0 boost/1.47.0/gcc447 samtools/0.1.19 cufflinks/2.2.1
time cuffdiff -o diff_D0_D2 -p 4 \
-b /lustre1/tw30282/annotation/GRCh37_iGenome/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome.fa \
-L H9_0,H9_2 -u ./merged_asm/merged.gtf \
./H9_0_rep1_en/accepted_hits.bam,./H9_0_rep2_en/accepted_hits.bam \
./H9_2_rep1_en/accepted_hits.bam,./H9_0_rep2_en/accepted_hits.bam
Can anyone give me a hint about the problems? Thanks.
Thank you for your quick response. I'd like to try it out. Do you know how I can extract a small set of sequencing data from my whole RNA-sequencing dataset? I want to try it with s subset of my data first.
See: C: How can I create a dummy dataset from my NGS whole-exome data? Use the method described and take a random sample(s) to prevent any bias.
Hi I runed my data with Stringtie and Ballgown again but I still got the same problem. That's weird. Before I move to DESeq2, do you think there might be something possibly wrong with the previous steps?
It's not a "problem", as much as a feature of the software. This only occurs when performing novel discovery, so it might be worth just running Stringtie with novel discovery turned off.
I used -e argument when I runned stringtie which should only map the transcripts which have references, right?
Ok, in that case maybe it's a feature of Ballgown. When
-e
is used, it should only quantify against the reference provided with-G
. Have you tried the DESeq2 approach from Stringtie described here? - I've used the DESeq2 approach recently and it provided the annotation I gave it with-G
as the output fromprepDE.py
.Thank you. So prepDE.py is a stringtie built-in package. I should run it after I generated ballgown files. Should I write my scripts like this?
module load stringtie/1.3.3b prepDE.py -l 101 -c -i sample_lst.txt -g annotation.gff