Hello everyone,
I am trying to use your tool to impute cellular fractions from RNA-seq data. CIBERSORTx does the job, however, I'm having a bit of a problem. Some of the genes that form part of the signature matrix are absent on the annotation that I used to quantify gene abundance. 34 genes to be precise. In order to circumvent this problem, I tried, to no avail, finding synonyms for this genes.
I had noticed that in the CIBERSORTx tutorials the following is mentioned: "CIBERSORTx performs a feature selection and therefore typically does not use all genes in the signature matrix. It is generally ok if some genes are missing from the user’s mixture file. If less than 50% of signature matrix genes overlap, CIBERSORTx will issue a warning." Knowing that, I decided to carry out a small analysis in which I aimed to identify the effect that missing genes would have on the results of the cell fraction imputation. I subsetted my data to quantification matrices with 150, 300, 450, 500 and 513 randomly selected genes. I fed this matrices as mixtures to CIBERSORTx and then compared the resulting fraction tables pairwise by measuring Pearson's correlation coefficient. The results are shown in the form of a heatmap.
As you can see, there are important differences in the estimated cell fractions even between the datasets with 500 and 513 genes. For this reason I feel like the affirmation made on the tutorial page is not accurate. I would like to know what are your thoughts on this, there may be flaws in my analysis or interpretation.
Thanks!
Hello, should I ask a question? Do you know what is the denominator of the output proportion? Thank you very much.
Hello. The denominator would be the sum of the regression coefficients obtained through Support Vector Regression. More on that in the original publication.