Sample-wise raw count matrix from 10x multiplexed data
1
0
Entering edit mode
7 days ago
AB ▴ 360

Hi all,

I want to run SoupX on my mulitplexed single cell RNASeq data. Now, 10X gives per sample counts for the filtered data but the raw data for all the samples are combined into one big file in the raw_feature_bc_matrix folder. I figured I should be able to separate the raw counts per sample by using the hashing info or the sample_molecule_info.h5 file, but I cant find the hashing info when I read in the raw_feature_bc_matrix.h5 or the raw_feature_bc_matrix.h5 files and the sample_molecule_info.h5 file in the per_sample_outs directory still has all the barcodes for all samples combined. Is there is a good way to get per sample raw counts from the 10x output ?

cellranger soupx singlecellRNASeq 10X • 322 views
ADD COMMENT
0
Entering edit mode
7 days ago
ATpoint 86k

It feels like you're mixing up quite some things here, lets clarify:

First of all, CellRanger (count and multi) produce the count matrix per sample, that is per GEM well. You get one output folder per sample with the raw and filtered feature-barcode matrix. The only difference between the two is that "raw" contains all barcodes (including noisy ones) and "filtered" contains the "reliable" ones based on CellRanger's knee detection method or the forced number of cells (if you tell CellRanger how many cells / barcodes to call), see https://kb.10xgenomics.com/hc/en-us/articles/360001892491-What-is-the-difference-between-the-filtered-and-raw-gene-barcode-matrix.

Both of these are "raw" data in the sense that it's the raw UMI counts and no actual analysis has been done on them, e.g. QC via mitochondrial fractions, expressed genes etc.

If you have hashtags (HTOs) then you have to demultiplex your data first. By default, CellRanger puts HTO counts at the end of the count matrix, so use tail() to see them. Tools like DropletUtils or CiteFuse (both Bioconductor) can do the demultiplexing for you, which essentially means that they decide whether a single HTO has enough counts and sufficiently more than the 2nd best HTO to assign the cell to a certain HTO with confidence.

Is there is a good way to get per sample raw counts from the 10x output ?

The filtered barcodes folder in each of the CellRanger output directories per sample.

ADD COMMENT

Login before adding your answer.

Traffic: 1442 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6