RNA_seq analysis cuffdiff output
1
1
Entering edit mode
7.3 years ago

Hi everyone, I am running cufflinks and cuffdiff to get differential expressed genes for two cells. But after cuffdiff, the output from gene.exp.diff is weird. There are several gene ids for the same gene and they have the same locus. I tried to use two different annotation genome, both hg19 and the one from Ensembl. Here is an example of my output.

XLOC_023143 XLOC_023143 NANOG   12:7940389-7949298  H9_0    H9_2    OK  4.17005 6.76886
XLOC_026123 XLOC_026123 NANOG   12:7940389-7949298  H9_0    H9_2    OK  1.30734 1.4116
XLOC_026124 XLOC_026124 NANOG   12:7940389-7949298  H9_0    H9_2    OK  43.3278 17.506
XLOC_026125 XLOC_026125 NANOG   12:7940389-7949298  H9_0    H9_2    OK  22.7351 14.8569

Here is my scripts. cufflinks

module load anaconda/2.2.0 boost/1.47.0/gcc447 samtools/0.1.19 cufflinks/2.2.1
time cufflinks -p 8 --library-type fr-firststrand -o H9_0_1_clout ./H9_0_1/accepted_hits.bam

cuffdiff

module load anaconda/2.2.0 boost/1.47.0/gcc447 samtools/0.1.19 cufflinks/2.2.1
time cuffdiff -o diff_D0_D2 -p 4 \
-b /lustre1/tw30282/annotation/GRCh37_iGenome/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome.fa \
-L H9_0,H9_2 -u ./merged_asm/merged.gtf \
./H9_0_rep1_en/accepted_hits.bam,./H9_0_rep2_en/accepted_hits.bam \
./H9_2_rep1_en/accepted_hits.bam,./H9_0_rep2_en/accepted_hits.bam

Can anyone give me a hint about the problems? Thanks.

RNA-Seq • 2.2k views
ADD COMMENT
2
Entering edit mode
7.3 years ago

The XLOC ID is the arbitrary window that Cuffdiff spans when performing the differential expression test, hence the two different XLOC codes are either overlapping or close to NANOG. As far as I know Cuffdiff has been deprecated / is in low maintenance. You should be using the replacement protocol with Stringtie and Ballgown, or alternative methods such as Salmon, tximport and DESeq2.

ADD COMMENT
0
Entering edit mode

Thank you for your quick response. I'd like to try it out. Do you know how I can extract a small set of sequencing data from my whole RNA-sequencing dataset? I want to try it with s subset of my data first.

ADD REPLY
0
Entering edit mode

See: C: How can I create a dummy dataset from my NGS whole-exome data? Use the method described and take a random sample(s) to prevent any bias.

ADD REPLY
0
Entering edit mode

Hi I runed my data with Stringtie and Ballgown again but I still got the same problem. That's weird. Before I move to DESeq2, do you think there might be something possibly wrong with the previous steps?

ADD REPLY
1
Entering edit mode

It's not a "problem", as much as a feature of the software. This only occurs when performing novel discovery, so it might be worth just running Stringtie with novel discovery turned off.

ADD REPLY
0
Entering edit mode

I used -e argument when I runned stringtie which should only map the transcripts which have references, right?

ADD REPLY
1
Entering edit mode

Ok, in that case maybe it's a feature of Ballgown. When -e is used, it should only quantify against the reference provided with -G. Have you tried the DESeq2 approach from Stringtie described here? - I've used the DESeq2 approach recently and it provided the annotation I gave it with -G as the output from prepDE.py.

ADD REPLY
0
Entering edit mode

Thank you. So prepDE.py is a stringtie built-in package. I should run it after I generated ballgown files. Should I write my scripts like this?

module load stringtie/1.3.3b prepDE.py -l 101 -c -i sample_lst.txt -g annotation.gff

ADD REPLY

Login before adding your answer.

Traffic: 2859 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6