Question

Examination of data matrix from CITE-seq-counts

0

Entering edit mode

5.1 years ago

cook.675 ▴ 230

I'm trying to understand a little better how the data is organized in these matrices so I can possibly trim them or manipulate them based on our needs. But I can't really do that if I dont understand what I'm cutting out yet. I'm still pretty new to this so bear with me here:

We ran some ADT data through CITEseq count and I got matrix/features and barcodes files. These I loaded into R and my object I called Data.adt. Here are the specs, I dont know how to copy this info over from so I have a picture:

Image showing matrix specs

You can see that its 25 x 737280. The 25 rownames are antibody names and sequences. But I dont understand the column tags exactly. How do the column tags relate to the # of cells, and how would you organize this information by cells? Specifically how would you extract and organize the data by cell, from this matrix?

Also what do the slots mean "i" and "p", and "x" specifically 1,158,178 data points? What does the data mean in these individual slots?

What I'm trying to do is see if there's a way I can shrink down this matrix in order run some commands that are taking up too much space. For instance if you had a matrix with.... 15,000 cells in it, then you took a random sampling from that matrix of 3000 cells, would would expect that the sample would be representative of the entire population. There are tests and graphs you could make to ensure this was true before moving on, but this is in essence what I want to do. I'm not sure where to being with this data set because Im still having trouble figuring out how the data is organized.

RNA-Seq seurat • 2.2k views

ADD COMMENT • link updated 5.0 years ago by Biostar 20 • written 5.1 years ago by cook.675 ▴ 230

0

Entering edit mode

Are the columns individual cell identifiers for this data set, so 737,280 single cells were sequenced? This seems like a lot right?

ADD REPLY • link 5.1 years ago by cook.675 ▴ 230

0

Entering edit mode

I think it depends on how you get the ADT matrix. I use cite-seq-count for ADT counting, and the output result can be easily convert to a matrix with cells X antibodies.

ADD REPLY • link 5.1 years ago by shoujun.gu ▴ 350

0

Entering edit mode

Yes this is interesting. We used CITEseq Count and loaded the umi_count folder into R for this data set and this is what we got. I just checked our records and we only sent 6000 cells for sequencing. Why is this matrix so big? We had other data with many, many more cells that we put through cellranger count and clustered only on the ADT data and it was fine the object wasn't nearly as big as this.

Edit: I just loaded the read_counts into R and looked at the matrix and its exactly the same as umi_counts. Would this indicate there was an error in the input parameters for CITE-seq count and I only got the reads and not the umi_coutns and thats why its so big?

ADD REPLY • link 5.1 years ago by cook.675 ▴ 230

0

Entering edit mode

just truncate a small part of the matrix and check what's the row names and column names.

ADD REPLY • link 5.1 years ago by shoujun.gu ▴ 350

0

Entering edit mode

If i truncate or create a subset, instead of the 737,280 columns I get, say 6000 or whatever.

the rownames will be the same, which are the Abs. The column names are the tags, still, like they were originally.

ADD REPLY • link 5.1 years ago by cook.675 ▴ 230

0

Entering edit mode

I think we figured out what the problem was. In the CITEseq-Count input parameters we had set the expected # of cells to 0, and instead used a white list which had (as you might guess...) 737,280 cell barcodes.

When we set the input parameters to no whitelist, and inputted the expected number of cells that we ran the chip with, we have good results

ADD REPLY • link 5.1 years ago by cook.675 ▴ 230