(1) It is possible to create a signature matrix with CibersortX from single-cell data, or alternatively from RNA-seq of sorted cell populations.
Is there an expectation, at least a general one, as to usage of which type of data would be more precise?
if scRNA-seq data are used to build a signature matrix, it is
straightforward to characterize its performance using synthetic
tissues created from single-cell transcriptomes. To ensure an unbiased
assessment, these source scRNA-seq transcriptomes used for the
creation of a synthetic tissue should be held out from the creation of
the signature matrix.
Does splitting the dataset into two, building the signature matrix from one half, and then validating the proportions on the other half of the dataset sound like a reasonable procedure?
(1) It is possible to create a signature matrix with CibersortX from single-cell data, or alternatively from RNA-seq of sorted cell populations. Is there an expectation, at least a general one, as to usage of which type of data would be more precise?
Pretty much impossible to say given it depends on the accuracy of the reference annotations or sorting, respectively. From experience, both seem to work well.
Does splitting the dataset into two, building the signature matrix from one half, and then validating the proportions on the other half of the dataset sound like a reasonable procedure?
That seems fine, though it's still somewhat double dipping, and I'd be mixing the proportions up a few different ways in your hold out set to see how well they're recovered. I'd try to find a separate, unassociated dataset with the same cell types to make mixtures from to help validate.