Entering edit mode
5.3 years ago
cook.675
▴
230
I have some data from CITEseq experiment (Umi.adt) I want to look at. The matrix is 24 x 200000 where the rows are antibody names and the columns are barcodes (cells). Many of the cells have few, little to no UMI counts.
I want to make a histogram with frequency on the y axis, and UMI count on the x-axis.
I can sum the UMI's for each cell by going x <- colSums(Umi.adt)
but how do I take this data and plot the frequency of each total UMI count across this data set?
plotting hist(x) gives one large column of frequency 250000
It seems im having some trouble understanding this data structure:
When I make
x <- colSums(Umi.adt)
x is a vector of type double, but it seems to have 2 dimensions, one is the tags, and the other is the sums that we calculated previously. When I doattributes (x)
I get..... $names [1] "CGATGGCTCGCACTCT" "CTCATGCGTCACCACG" "NTAGGTTCACAGTCGC" "CCTCAACAGCGGGTAT" "TGCATTGTAGGATATA" etc.........................when I run
head(x)
I get....CGATGGCTCGCACTCT CTCATGCGTCACCACG NTAGGTTCACAGTCGC CCTCAACAGCGGGTAT TGCATTGTAGGATATA TGTTACTGTATCGAAA 1 14 1 7 2 96
When I run hist(x) I think the program is using the tags? Im not really sure whats happening
Here if we just look at the first data point in the vector it has
x[1] CGATGGCTCGCACTCT 1
How would you separate out the number from the string?
The data point in this case is characterized by a name and by a value. You don't need to separate value from name. You can create a name-less vector for example with
y=as.numeric(x)
but it's not required. What doessummary(x)
show?Thanks; I only have access to the data set I ran on 6000 cells now, so the matrix is 24 x 6000 but its the same thing essentially. The histogram shows one giant bar at frequency 6000 and then some smaller ones. Summary(x) shows:
and is the same whether I run as is or use
y=as.numeric(x)
as you suggestedhere is the histogram
Data point 5625 is dominating the histogram cell size. You can zoom in for example with
hist(x,breaks=1000,xlim=c(0,400))
or just exclude outlier/s. The issue here is R and plot related. You should explore the reason for the outlier though.Ahhh yes I have it now!
Thank you so much