Question

Observation about CIBERSORTx variations in results due to missing genes

0

Entering edit mode

3.6 years ago

argonvibio • 0

Hello everyone,

I am trying to use your tool to impute cellular fractions from RNA-seq data. CIBERSORTx does the job, however, I'm having a bit of a problem. Some of the genes that form part of the signature matrix are absent on the annotation that I used to quantify gene abundance. 34 genes to be precise. In order to circumvent this problem, I tried, to no avail, finding synonyms for this genes.

I had noticed that in the CIBERSORTx tutorials the following is mentioned: "CIBERSORTx performs a feature selection and therefore typically does not use all genes in the signature matrix. It is generally ok if some genes are missing from the user’s mixture file. If less than 50% of signature matrix genes overlap, CIBERSORTx will issue a warning." Knowing that, I decided to carry out a small analysis in which I aimed to identify the effect that missing genes would have on the results of the cell fraction imputation. I subsetted my data to quantification matrices with 150, 300, 450, 500 and 513 randomly selected genes. I fed this matrices as mixtures to CIBERSORTx and then compared the resulting fraction tables pairwise by measuring Pearson's correlation coefficient. The results are shown in the form of a heatmap.

enter image description here

As you can see, there are important differences in the estimated cell fractions even between the datasets with 500 and 513 genes. For this reason I feel like the affirmation made on the tutorial page is not accurate. I would like to know what are your thoughts on this, there may be flaws in my analysis or interpretation.

Thanks!

CIBERSORTx deconvolution RNA-seq type Cell • 2.3k views

ADD COMMENT • link 3.1 years ago by argonvibio • 0

0

Entering edit mode

Hello, should I ask a question? Do you know what is the denominator of the output proportion? Thank you very much.

ADD REPLY • link 3.2 years ago by Wakala ▴ 20

0

Entering edit mode

Hello. The denominator would be the sum of the regression coefficients obtained through Support Vector Regression. More on that in the original publication.

Negative SVR regression coefficients are subsequently set to zero (as done for LLSR), and the remaining regression coefficients are normalized to sum to 1, yielding a final vector of estimated cell type fractions.

ADD REPLY • link 3.1 years ago by argonvibio • 0