Question

subsetting matrix elements and creating a histogram

0

Entering edit mode

5.6 years ago

cook.675 ▴ 240

I have some data from CITEseq experiment (Umi.adt) I want to look at. The matrix is 24 x 200000 where the rows are antibody names and the columns are barcodes (cells). Many of the cells have few, little to no UMI counts.

I want to make a histogram with frequency on the y axis, and UMI count on the x-axis.

I can sum the UMI's for each cell by going x <- colSums(Umi.adt) but how do I take this data and plot the frequency of each total UMI count across this data set?

plotting hist(x) gives one large column of frequency 250000

RNA-Seq • 2.0k views

ADD COMMENT • link updated 5.6 years ago by Ram 45k • written 5.6 years ago by cook.675 ▴ 240

0

Entering edit mode

It seems im having some trouble understanding this data structure:

When I make x <- colSums(Umi.adt) x is a vector of type double, but it seems to have 2 dimensions, one is the tags, and the other is the sums that we calculated previously. When I do attributes (x) I get..... $names [1] "CGATGGCTCGCACTCT" "CTCATGCGTCACCACG" "NTAGGTTCACAGTCGC" "CCTCAACAGCGGGTAT" "TGCATTGTAGGATATA" etc.........................

when I run head(x) I get....

CGATGGCTCGCACTCT CTCATGCGTCACCACG NTAGGTTCACAGTCGC CCTCAACAGCGGGTAT TGCATTGTAGGATATA TGTTACTGTATCGAAA 1 14 1 7 2 96

When I run hist(x) I think the program is using the tags? Im not really sure whats happening

ADD REPLY • link 5.6 years ago by cook.675 ▴ 240

0

Entering edit mode

Here if we just look at the first data point in the vector it has

x[1] CGATGGCTCGCACTCT 1

How would you separate out the number from the string?

ADD REPLY • link 5.6 years ago by cook.675 ▴ 240

0

Entering edit mode

The data point in this case is characterized by a name and by a value. You don't need to separate value from name. You can create a name-less vector for example with y=as.numeric(x) but it's not required. What does summary(x) show?

ADD REPLY • link 5.6 years ago by jomo018 ▴ 730

0

Entering edit mode

Thanks; I only have access to the data set I ran on 6000 cells now, so the matrix is 24 x 6000 but its the same thing essentially. The histogram shows one giant bar at frequency 6000 and then some smaller ones. Summary(x) shows:

Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 


12.0    55.0    84.0   170.6   190.0  5625.0

and is the same whether I run as is or use y=as.numeric(x) as you suggested

here is the histogram

ADD REPLY • link 5.6 years ago by cook.675 ▴ 240

0

Entering edit mode

Data point 5625 is dominating the histogram cell size. You can zoom in for example with hist(x,breaks=1000,xlim=c(0,400)) or just exclude outlier/s. The issue here is R and plot related. You should explore the reason for the outlier though.

ADD REPLY • link 5.6 years ago by jomo018 ▴ 730

0

Entering edit mode

Ahhh yes I have it now!

Thank you so much

ADD REPLY • link 5.6 years ago by cook.675 ▴ 240