subsetting matrix elements and creating a histogram
0
0
Entering edit mode
5.1 years ago
cook.675 ▴ 230

I have some data from CITEseq experiment (Umi.adt) I want to look at. The matrix is 24 x 200000 where the rows are antibody names and the columns are barcodes (cells). Many of the cells have few, little to no UMI counts.

I want to make a histogram with frequency on the y axis, and UMI count on the x-axis.

I can sum the UMI's for each cell by going x <- colSums(Umi.adt) but how do I take this data and plot the frequency of each total UMI count across this data set?

plotting hist(x) gives one large column of frequency 250000

RNA-Seq • 1.8k views
ADD COMMENT
0
Entering edit mode

It seems im having some trouble understanding this data structure:

When I make x <- colSums(Umi.adt) x is a vector of type double, but it seems to have 2 dimensions, one is the tags, and the other is the sums that we calculated previously. When I do attributes (x) I get..... $names [1] "CGATGGCTCGCACTCT" "CTCATGCGTCACCACG" "NTAGGTTCACAGTCGC" "CCTCAACAGCGGGTAT" "TGCATTGTAGGATATA" etc.........................

when I run head(x) I get....

CGATGGCTCGCACTCT CTCATGCGTCACCACG NTAGGTTCACAGTCGC CCTCAACAGCGGGTAT TGCATTGTAGGATATA TGTTACTGTATCGAAA 1 14 1 7 2 96

When I run hist(x) I think the program is using the tags? Im not really sure whats happening

ADD REPLY
0
Entering edit mode

Here if we just look at the first data point in the vector it has

x[1] CGATGGCTCGCACTCT 1

How would you separate out the number from the string?

ADD REPLY
0
Entering edit mode

The data point in this case is characterized by a name and by a value. You don't need to separate value from name. You can create a name-less vector for example with y=as.numeric(x) but it's not required. What does summary(x) show?

ADD REPLY
0
Entering edit mode

Thanks; I only have access to the data set I ran on 6000 cells now, so the matrix is 24 x 6000 but its the same thing essentially. The histogram shows one giant bar at frequency 6000 and then some smaller ones. Summary(x) shows:

Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 


12.0    55.0    84.0   170.6   190.0  5625.0

and is the same whether I run as is or use y=as.numeric(x) as you suggested

here is the histogram

ADD REPLY
0
Entering edit mode

Data point 5625 is dominating the histogram cell size. You can zoom in for example with hist(x,breaks=1000,xlim=c(0,400)) or just exclude outlier/s. The issue here is R and plot related. You should explore the reason for the outlier though.

ADD REPLY
0
Entering edit mode

Ahhh yes I have it now!

Thank you so much

ADD REPLY

Login before adding your answer.

Traffic: 1661 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6