Working with some GTEx portal data right now and I've noticed that at GTEx's downloads page there are both a "Gene TPMs" and a "Transcript TPMs" file. My question is how exactly do these files differ from each other in terms of the steps for obtaining such files? I guess another way to phrase it would be why are there two files like this if RNA-Seq is supposed to output reads for transcripts in general? I would expect only 1 file with all the transcripts from GTEx instead of one that makes a distinction of gene vs. transcript... I'm obviously missing out on something rather elemental here but I don't know what it is.
Another minor question is whether or not if it is safe to assume that data from these files is normalized. As I understand, the data being TPMs implies the read counts have been normalized in the process of converting to the TPMs themselves, but I'm not 100% sure about this Thanks for any help.
what is exactly being measured in "gene counts" though?
Mike, I am very sorry if I am being pedantic and what I say below is too simplistic.
Here, the word "transcript" does not mean the mRNA product of the gene. The "Gene" and the "Transcript" are those defined in the gene definition file (gtf, or gff/gff3). For example, Hoxa1 gene in human has two transcripts according to ensembl. So if GTEx has used ensembl gene definition the "transcript TPM" file will have two values while the "gene TPM" file will have only one value.
Not pedantic at all, this actually makes a lot of sense. Thank you
This is super helpful and exactly the answer I was looking for, thank you!!
Reads mapped to the gene.