Compare gene coverage across group o samples
1
0
Entering edit mode
3.4 years ago
roalva1 ▴ 90

Hello,

I have a matrix data where rows are samples(40 in one group and 40 in the other), columns genes(20k) and the value is the coverage. Like this

Sample Group Gene1 Gene2 Gene20000
S1 A 100 200 ...
S2 A 200 70 ...
S3 B 89 34 ...

This is genomic data, not RNAseq data, so I can't use edgeR or similars. Also I could have 0 coverage in some values, in fact, they are very interesting. I know I could use some CNV software, but I specially interested in this analysis.

So, my question is about if I could use any statistical test or modeling to get the most significant genes.

Many thanks

gene coverage • 1.3k views
ADD COMMENT
0
Entering edit mode
3.4 years ago
seidel 11k

This is genomic data, not RNAseq data, so I can't use edgeR or similars.

I'm not sure why you think edgeR can't be used for genomic regions. There are many examples of quantifying genomic loci counts using these sorts of tools. (some discussion from 2012: https://support.bioconductor.org/p/49110/ and a lot has happened since then).

ADD COMMENT
0
Entering edit mode

That's an interesting point. My understanding of differential expression (DE) analysis is that you often make some kind of assumption about how RNAseq read counts are distributed (e.g. negative bionomial). In fact, I think tools like edgeR and DEseq make such assumption (Why Does Rna-Seq Read Count Fit Poisson Distribution?). Given that it's DNAseq read counts, I would think that the distribution would be vastly different from that of RNAseq reads. So I'm little bit surprised to hear that tools for RNAseq DE can handle cases like the one OP is asking.

ADD REPLY
0
Entering edit mode

I'll just ask - why would you think the distributions would be vastly different, simply based on the source molecule type? Especially after the link you posted. We have no idea (at least I don't) of what kind of experiment produced the data in the table presented by @roalva1, and that's really the key - we don't know the nature of the process generating the reads. Maybe they try to offer a clue by saying they know they can use CNV software. For me a basic assumption would be they measured some kind of process that generates reads across the genome that can be mapped to their 20k "genes" and they'd like to compare these between groups, which sounds similar to RNA Seq, or ChIP Seq, such that one expects some number of things to be unchanged between groups, and some number of things to significantly different. I suppose it would help if people would actually describe what they're doing :)

ADD REPLY
0
Entering edit mode

Dear,

the experiment is exome dna sequencing from tumor and control. I think it is not a good solution to use edgerR, since it disregards genes with count 0, and I am very interested in them.

ADD REPLY

Login before adding your answer.

Traffic: 1727 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6