I am not sure if anyone has similar observation. When I ran Cufflinks with standard option (e.g. -G $Gencode_GTF -M $Mask_GTF --compatible-hits-norm --multi-read-correct
), in its output isoforms.fpkm_tracking
file, I expected only Transcript ID (i.e. ENSTxxxxxx for human Gencode GTF) in the tracking_id
column, but actually I saw every gene has a row with ENSGxxxxx there (see the first line in the grep output below). And its length seems abnormally long. Does anyone have clue? I am still waiting for reply from the cufflinks group.
$grep -w -P "tracking_id|XPR1" isoforms.fpkm_tracking
tracking_id class_code nearest_ref_id gene_id gene_short_name tss_id locus length coverage FPKM FPKM_conf_lo FPKM_conf_hi FPKM_status
ENSG00000143324.9 - - ENSG00000143324.9 XPR1 - chr1:180601139-180859387 258248 0.209926 0.20419 0.188323 0.220131 OK
ENST00000367590.4 - - ENSG00000143324.9 XPR1 - chr1:180601139-180859387 8474 3.30152 3.2113 2.81811 3.61095 OK
ENST00000367589.3 - - ENSG00000143324.9 XPR1 - chr1:180601167-180855262 4126 2.83119 2.75383 2.21059 3.2843 OK
ENST00000498177.1 - - ENSG00000143324.9 XPR1 - chr1:180805787-180843191 485 2.0379e-07 1.98221e-07 0 0.268656 OK
ENST00000464817.1 - - ENSG00000143324.9 XPR1 - chr1:180832899-180847426 567 0.54371 0.528852 0 1.03411 OK
ENST00000467345.1 - - ENSG00000143324.9 XPR1 - chr1:180856707-180857584 736 3.28404e-06 3.1943e-06 0 0.23211 OK
What does your Gencode_GTF file contain?
it's typical gtf file download from gencode.org