Hello,
I have a matrix data where rows are samples(40 in one group and 40 in the other), columns genes(20k) and the value is the coverage. Like this
Sample | Group | Gene1 | Gene2 | Gene20000 |
---|---|---|---|---|
S1 | A | 100 | 200 | ... |
S2 | A | 200 | 70 | ... |
S3 | B | 89 | 34 | ... |
This is genomic data, not RNAseq data, so I can't use edgeR or similars. Also I could have 0 coverage in some values, in fact, they are very interesting. I know I could use some CNV software, but I specially interested in this analysis.
So, my question is about if I could use any statistical test or modeling to get the most significant genes.
Many thanks
That's an interesting point. My understanding of differential expression (DE) analysis is that you often make some kind of assumption about how RNAseq read counts are distributed (e.g. negative bionomial). In fact, I think tools like edgeR and DEseq make such assumption (Why Does Rna-Seq Read Count Fit Poisson Distribution?). Given that it's DNAseq read counts, I would think that the distribution would be vastly different from that of RNAseq reads. So I'm little bit surprised to hear that tools for RNAseq DE can handle cases like the one OP is asking.
I'll just ask - why would you think the distributions would be vastly different, simply based on the source molecule type? Especially after the link you posted. We have no idea (at least I don't) of what kind of experiment produced the data in the table presented by @roalva1, and that's really the key - we don't know the nature of the process generating the reads. Maybe they try to offer a clue by saying they know they can use CNV software. For me a basic assumption would be they measured some kind of process that generates reads across the genome that can be mapped to their 20k "genes" and they'd like to compare these between groups, which sounds similar to RNA Seq, or ChIP Seq, such that one expects some number of things to be unchanged between groups, and some number of things to significantly different. I suppose it would help if people would actually describe what they're doing :)
Dear,
the experiment is exome dna sequencing from tumor and control. I think it is not a good solution to use edgerR, since it disregards genes with count 0, and I am very interested in them.