Entering edit mode
5.9 years ago
HZZ0036
▴
30
Hi,
I have 8 transcriptome samples. After mapping transcriptome reads to genome and ran cufflinks, I get FPKM results like:
gene_id trans_255FPKM
arahy.Tifrunner.gnm1.ann1.0002EG 0
arahy.Tifrunner.gnm1.ann1.0008XH 0.0693565
arahy.Tifrunner.gnm1.ann1.0046K4 0.173405
arahy.Tifrunner.gnm1.ann1.005S9P 0.0491059
arahy.Tifrunner.gnm1.ann1.0067NC 0.385907
arahy.Tifrunner.gnm1.ann1.006X4N 0.256161
arahy.Tifrunner.gnm1.ann1.008EVY 0
arahy.Tifrunner.gnm1.ann1.0093F6 0.282329
arahy.Tifrunner.gnm1.ann1.009U6B 0.564484
.......
gene_id trans_256FPKM
arahy.Tifrunner.gnm1.ann1.0002EG 0
arahy.Tifrunner.gnm1.ann1.0008XH 0.0983234
arahy.Tifrunner.gnm1.ann1.004CIH 2.31641
arahy.Tifrunner.gnm1.ann1.005S9P 0.0493107
arahy.Tifrunner.gnm1.ann1.0067NC 0.457726
arahy.Tifrunner.gnm1.ann1.006X4N 0.36012
arahy.Tifrunner.gnm1.ann1.006XS9 0.709858
arahy.Tifrunner.gnm1.ann1.008EVY 0
arahy.Tifrunner.gnm1.ann1.0093F6 0.475885
.......
gene_id trans_262FPKM
............
......... How to combine these files into one file? If there is no geneX, the FPKM will be 0. I want to get a file like this:
gene_id trans_255FPKM trans_256FPKM ........ trans_262FPKM
arahy.Tifrunner.gnm1.ann1.0002EG 0 0 ............
Thanks in advance.
What have you tried already? In which programming / scripting language are you most comfortable?
I have tried Python:
But the results are not correct. It only list the genes id in trans_256. Could you please tell me how to improve the script? Thanks.
The results like this:
Some genes id in trans_255 didn't show up.
There may just be no record of them in that sample, as you have implied. If you are confident that your Python script is doing the correct thing, then just convert the
NaN
values to0
or justNA
. The downstream program that you use may be able to tolerateNA
values.Both cufflinks and FPKM are NOT recommended anymore. What is the aim of your analysis?
Could you please tell me why? I need to get FPKM values to do gene co-expression analysis.
Please read:
For co-expression analysis like, for example, WGCNA, then FPKM is okay. This is because WGCNA is based on correlation. I assume that you are aiming to use WGCNA (every person who starts in bioinformatics uses it, even though there are better tools available).