How can I edit the output from Cufflinks to do my own normalization?
1
0
Entering edit mode
9.3 years ago

Hey everyone,

I am running an experiment with 4 samples paired-end among 2 conditions (Control vs Mutation) and 2 replicates of each one (C1, C2, MUT1, MUT2).

After mapping with segemehl, I build the transcripts with Cufflinks. So, at the end I have transcripts.gtf, genes.fpkm_tracking and isoforms.fpkm_tracking. Now I have to pick the count (FPKM) of each gene and divide by a certain value corresponding the count of plasmid that was inserted in each sample and then proceed with the pipeline (cuffmerge and cuffdiff).

This values can be found in the table bellow.

Sample   Value
C1       445.188/0.296
C2       137.217/0.196
MUT1     340.072/0.143
MUT2     643.493/0.271

But how can I do that? I already tried to edit the output from cufflinks and divide the counts of the 3 files, but when I merge the transcripts, all values disappear. I can't try after runs cuffmerge because the samples are merged and I can't discriminate the samples.

Is there a way to do it?

Cufflinks RNA-Seq Normalization • 2.8k views
ADD COMMENT
0
Entering edit mode

cufflinks package also outputs estimated raw counts. You could use them to normalise again.

ADD REPLY
0
Entering edit mode
Yes, but how can I refeed the cufflinks/cuffdif with this information? My goal is find differential expressed HOX genes.
ADD REPLY
0
Entering edit mode

You don't. Cuffdiff is only designed to be used in a few predefined ways, of which what you're trying to do isn't one.

ADD REPLY
0
Entering edit mode

Ok, I will try to use DeSeq2 with the raw counts from cuffdiff, but the values are not integers. Deseq2 can accept this kind of values?

ADD REPLY
0
Entering edit mode

No, you'll either need to round them (not ideal) or instead use either edgeR or limma/voom.

ADD REPLY
0
Entering edit mode

Either that or use something like htseq_count

ADD REPLY
1
Entering edit mode
9.2 years ago

Mucking around with data produced by one suite and putting it into the other unrelated one is a favorite past time of those that, as they say, "just want to use the tool everyone is using" - a hair raising example was someone telling me how they took FPKM values produced by Cuffdiff and wanted to use DESeq with it but because these values were too small and non integer they just ended up multiplying everything by 1000 and then "DeSeq worked" ... (bioinformatics man, everything is possible, probably published as well)

My advice if you can't use the Cuffdiff pipeline use something else that takes into account your specifics, and don't try to make it work by rescaling after the fact etc. Your rescaling will very likely be all wrong.

ADD COMMENT

Login before adding your answer.

Traffic: 1614 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6