Entering edit mode
4.2 years ago
jack.henry
▴
50
I am trying to read in some expression data from the ICGC, however I am having some trouble with duplicates.
Firstly I read in the data.
PACACASeq <- read.table("./CountMatrices/PACA_CA/exp_seq.tsv", sep = '\t', header = TRUE, stringsAsFactors = FALSE)
Get a table like this with counts, sample Ids and gene Ids.
I then use reshape2 to try to convert this into a count matrix like so:
PACACASeqCounts <- dcast(PACACASeq, gene_id ~ icgc_sample_id, value.var = "raw_read_count")
But this generates the notification
Aggregation function missing: defaulting to length
Which is resultant from there being duplicates of some sample ids/counts/gene names. I end up getting a matrix of 1's.
I was wondering if anyone has come into the same problem and how they sorted it.
Thanks in advance.
Hi Jack,
We are working with the same data and we have found exactly the same problem. Did you solve it? If so, could you tell us how?
Thank you very much in advance.
Best regards,
Sergio.