I have a large tab separated matrix containing FPKM values (expression) of known and novel genes and transcripts. The code first needs to calculate overall FPKM for a gene and then divide each isoform FPKM by overall gene FPKM. For example below MSTRG.1 gene contains three transcripts (AT1G01010.1, MSTRG.1.2, MSTRG.1.3) and transcript FPKM values in the corresponding columns:
gene_id trans Sample1 Sample2
MSTRG.1 AT1G01010.1 3.217145 5.362317
MSTRG.1 MSTRG.1.2 0 0
MSTRG.1 MSTRG.1.3 0 1.265547
AT3G04280 AT3G06460.1 0 4.852563
AT3G04280 MSTRG.12548.1 0.099178 0.480905
AT3G04280 AT3G06470.1 4.548129 6.963614
So the overall gene expression for sample1 for gene MSTRG.1 is 3.217145 and for AT3G04280 is 4.647307, similarly, the gene expression for sample2 for gene MSTRG.1 is 6.627864 and for AT3G04280 is 12.297082, when we divide the transcript expression by gene expression, the output matrix will be something like this:
gene_id trans Sample1 Sample2
MSTRG.1 AT1G01010.1 1 0.809056582935317
MSTRG.1 MSTRG.1.2 0 0
MSTRG.1 MSTRG.1.3 0 0.190943417064683
AT3G04280 AT3G06460.1 0 0.3946
AT3G04280 MSTRG.12548.1 0.02134 0.039
AT3G04280 AT3G06470.1 0.9786 0.566
Any help will be highly appreciated.
This reads as you want someone to write the code for you... what have you tried?