Entering edit mode
2.4 years ago
LayneSadler
▴
90
I have TPM counts from 36 participants, but they are all diseased.
- aligned to GRCh37.75
- ensembl genes
Are there any good sources of "healthy" control TPM counts?
The only good source would have been to include matched tissue in your study. RNA-seq is a relative assay and batch effects between unrelated studies make it close to impossible to meaningfully compare them unless prepared in the same batch, same kit, same procedures, same everything. Are you aware of that? What information do you exactly seek?
That's right, you cannot do DE without samples from the same experiment. At most, you can compare the top expressed genes against the GTEX data to get an idea of whether your top expressed genes are also normally expressed in the same tissue. There are a few methods to do meta-analysis for RNASeq data, for example, this one metaseq - could the user apply one of these?
That having said, it is completely unstandardized how TPM is calculated. Correctly one would use the length information based on the factual length of the transcripts being expressed, like e.g. the salmon-tximport pipeline provides it. But some people use either the entire length of the annotated genes, the average of all transcripts, the union if exons, etc to calculate it. Hence, on top of the experimental batch effect different sources of TPM might have in silico batch effects making even top-wise comparisons difficult. Sure, if something is zero in one and skyrocketing in the other it could be true, but everything else is to be considered with much care (or not at all).
Thank you. I am constructing a diagnostic algorithm that uses gene TPMs to predict the presence of the disease. Then I want to permute the TPMs to figure out the most important genes. So I need TPMs from both cases and controls.
Without knowing details it sounds you are using "ground truth" that is confounded, so in all likelihood performance of that algorithm will suffer. Arguments have been made above, on you to follow it or not, but I personally would get a collaborator and generate case and controls in a matched study to actually have a solid ground truth.