Question

CIBERSORT deconvolution advice

10

Entering edit mode

4.7 years ago

joker33 ▴ 150

Dear community,

I was wondering if any of you could give me some advice with the following issues:

All results are different when performing "absolute" deconvolution using legacy CIBERSORT source code R version (no.sumto1 & sig.score), legacy CIBERSORT online tool and the online CIBERSORTx tool (see example copied below). Has there been a change in the code between versions?
I have very large matrices to deconvolute (up to 780MB). Is it possible to split up matrices for deconvolution of samples I want to compare?
How does CIBERSORT accomplish to exclude non-hematopoietic genes during signature matrix genetartion? Is there an internal database of "hematopoietic" and "non-hematopoietic" genes? In this case, I am wondering which gene annotation tool and which snapshot thereof was used to define "non-hematopoietic" genes? The reason why I am asking is, that I am concerned this will affect signature matrix generation depending on which annotation tool and version I use for gene annotation of the reference file due to tool-specific annotation differences in gene names (alterantive gene names, novel transcripts etc...) that don't match the CIBERSORT internal record.
For the imputation of cell fractions, how important is it to match gene annotation of signature and mixture in regard to gene-annotation differences caused by different annotation tools/versions. Do I have to assume that the results are always penalised in case the mixture and signature/reference files were annotated in different labs (with a different annotation tool / version)? How would an annotation difference of more than 15% of genes between signature and mixture affect deconvolution performance?
Error during the imputation of cell fractions using CIBERSORTx: "WARNING: reaching max number of iterations". What is the cause and how can I solve it?

CIBERSORT absolute deconvolution results

Thank you!!

CIBERSORT CIBERSORTx deconvolution • 15k views

ADD COMMENT • link updated 2.3 years ago by Riley.Risteen • 0 • written 4.7 years ago by joker33 ▴ 150

1

Entering edit mode

Hi, I'm using CIBERSORTx for check cell types on RNA-Seq data, and I'm not sure about which count normalization it's better for input: CPM or TPM. I'm using LM22 as the signature matrix. The tutorial recommends that we should normalize the mixture file same as signature matrix, however, LM22 is microarray and I don't know how it was normalized. I also have access only to count matrix and using CPM will be easier, but I'm not sure if this is the best way. What signature matrix did you use? How did you normalize your counts?

ADD REPLY • link 4.1 years ago by starick.marick ▴ 10

4

Entering edit mode

Hi, I'm pretty sure LM22 is RMA-normalised as I have read that somewhere in the CIBERSORTx paper, could be under the supplementary information section. CIBERSORTx has a batch correction option that removes technical differences between signature matrix and mixture file that derived from different platforms (like your case - a signature matrix derived from microarray [LM22] and mixture file from RNA-seq data), so I think either CPM or TPM is fine. However, I personally would go for TPM because the authors of CIBERSORTx mainly used TPM files for their research/analysis, as reported in their paper... As you can see, I am only following the protocols set by CIBERSORTx team, if anyone has better explanation to which normalisation method to use, I would love to hear it!

ADD REPLY • link 3.9 years ago by jill.syx ▴ 70

0

Entering edit mode

Hi! I was wondering if there's any follow-ups to all the questions posted? Because I am having the same questions in mind as well, especially the second question. I'd be really grateful if you could please share any updates/findings with me. If not, have you tried contacting the cibersortx team? (because I did but haven't heard anything back from them...).

ADD REPLY • link 3.9 years ago by jill.syx ▴ 70

0

Entering edit mode

Hi, I'm also wondering the same thing about matrices being too large, it's not allowed to upload them. Just to try CIBERSORTx, I reduced the size of my matrices by leaving out some samples but then it gives an error after a long run: "cannot allocate memory". I contacted the team but no response so far. If somebody already tackled this problem, that would be so helpful to share it.

ADD REPLY • link 3.5 years ago by berry ▴ 40

0

Entering edit mode

Hi berry, I have run into the same error before too and then realised it could mean that the quota limit has been exceeded (>1GB) - basically you have to take the size of the results file into consideration too. What I did was keep reducing the size of the matrices until there is also enough space for my results file. I have asked the team about using large matrices as input and they advised running CIBERSORTx via Docker. Hope this helps!

ADD REPLY • link 3.5 years ago by jill.syx ▴ 70

0

Entering edit mode

Hi jill.syx, thank you very much for your answer, it's very helpful. I have one last question, how did you keep reducing the size of your files? Is there a way to compress these txt formatted matrices? Or you also had to remove some samples? (my single cell reference matrix is too large even though I reduced it to 3 10X samples)

ADD REPLY • link 3.5 years ago by berry ▴ 40

3

Entering edit mode

Hi berry, I presume you're talking about the first module of CIBERSORTx (creating signature matrix), I also presume that when you say a sample you mean a certain cell type instead of a single cell.

Yes, I did removed some samples but what I removed most was the cells within one sample, e.g. if I have 1000 cells within sample A (or cell type labelled A), I remove half of the cells so that I'm only left with 500 cells (by random selection). And I also kept this consistent for other samples (e.g. if I remove 50% of the cells in sample A, I also remove 50% in sample B, C etc..). This shouldn't affect too much of the results because based on what I read and understood from the CIBERSORTx paper, it kind of does the same too as it by default takes only 50% of the cells from a sample to build the signature matrix (also done by random selection without replacement; you can also change this to any percentage that you want in the "sampling" under "single cell input options" tab). If you're afraid that this will increase the results variability, I suggest doing several repeats and see whether you will still get the same deconvolution results in the end, though this is extremely time-consuming...

Another thing I did to reduce the file size was filter out genes with no expression across all cells.

I hope all this makes sense to you! Happy to answer any further questions that you might have!

ADD REPLY • link 3.5 years ago by jill.syx ▴ 70

0

Entering edit mode

Thank you very much!

ADD REPLY • link 3.4 years ago by berry ▴ 40

score 5 · Answer 1 · 2021-06-29

Hi! I've came across some new information and wanted to post the updates:

The results from absolute deconvolution possibly differ slightly between CIBERSORT versions as there has been an update how the absolute score is calculated (scaling either to mean or median). It is recommended to always use the latest version.
I don't know if it is possible to split large matrices. But I would guess it is probably best to analyse all samples to compare in one go. An alternative to the online we platform is to apply for docker access. Then you can run samples with unlimited size directly via docker from your computer.
CIBERSORT is apparently quite robust and does not rely on all signature genes being present in the mixture matrix. In the two cibersort papers there are some benchmarks and it looks like the method can handle only a fraction of signature genes being present. In addition CIBERSORT delivers reliable results also in a noisy data set with up to 90% noise (e.g. cancer cell fraction).
The error during computation on the website seemed to be associated with file size limits. Using CIBERSORT via docker seemed to solve this issue.

Hope this update helps