Thanks for your attention,
TLDR:
- The minimum value in a transformed count matrix is -2.57. How can I guess what transformation was applied?
- Any good advice on performing differential gene analysis on such transformed data?
Details:
- I would like to analyze RNA data, but the data is controlled. So I downloaded the processed data from the original paper.
- According to the authors, the data was processed using "The R-packages, tximport and edgeR, were used to respectively summarize the expression at gene-level and normalize the data."
- I found that the maximum value was around 15 so I suspect the data was log-transformed.
- Besides, the minimum value was -2.57, which appeared 310861 times in the 20453x96 matrix, with a frequency of 15.8%.
FYI:
- Here is the paper: https://www.nature.com/articles/s41467-020-18640-0
- the cpm function in edgeR has a default base of 2 and prior.count of 2.
- A snapshot of the data:
email the corresponding author of the paper
I have previously emailed the original author to request the raw data (which they cannot share due to EU regulations), but I would like to refrain from bothering them again unless absolutely necessary out of courtesy. Thank you for your attention and guidance.