The difference of RNA-seq data between TCGA and ENCODE

0

Entering edit mode

2.4 years ago

dao • 0

Hi All,

I'm new to RNAseq, so I'm confused with a couple of terms. Now, I have questions about RNA-seq data from different databases.

the first is from TCGA, columns include:

gene_id gene_name gene_type unstranded stranded_first stranded_second tpm_unstranded fpkm_unstranded fpkm_uq_unstranded

the seconde is from ENCODE, columns include:

gene_id transcript_id(s) length effective_length expected_count TPM FPKM posterior_mean_count posterior_standard_deviation_of_count pme_TPM pme_FPKM TPM_ci_lower_bound TPM_ci_upper_bound FPKM_ci_lower_bound FPKM_ci_upper_bound

My question is: Do the two files have equivalent parameters (column names)?

I am really confused, and please help. Thanks in advance.

Best regards,

Dao

RNA-seq • 974 views

ADD COMMENT • link updated 2.4 years ago by DareDevil ★ 4.3k • written 2.4 years ago by dao • 0

1

Entering edit mode

TCGA and ENCODE are two different databases (projects) and they follow different naming convensions.

The counts given in TCGA (V32) are based STAR pipeline

ADD REPLY • link 2.4 years ago by DareDevil ★ 4.3k

0

Entering edit mode

Hi, Thanks for your answer, I would like to know if there is a tool that can be used for format conversion?