Question

Compare gene coverage across group o samples

0

Entering edit mode

3.4 years ago

roalva1 ▴ 90

Hello,

I have a matrix data where rows are samples(40 in one group and 40 in the other), columns genes(20k) and the value is the coverage. Like this

Sample	Group	Gene1	Gene2	Gene20000
S1	A	100	200	...
S2	A	200	70	...
S3	B	89	34	...

This is genomic data, not RNAseq data, so I can't use edgeR or similars. Also I could have 0 coverage in some values, in fact, they are very interesting. I know I could use some CNV software, but I specially interested in this analysis.

So, my question is about if I could use any statistical test or modeling to get the most significant genes.

Many thanks

gene coverage • 1.3k views

ADD COMMENT • link 3.4 years ago by roalva1 ▴ 90

score 0 · Answer 1 · 2021-06-18

0

Entering edit mode

3.4 years ago

seidel 11k

This is genomic data, not RNAseq data, so I can't use edgeR or similars.

I'm not sure why you think edgeR can't be used for genomic regions. There are many examples of quantifying genomic loci counts using these sorts of tools. (some discussion from 2012: https://support.bioconductor.org/p/49110/ and a lot has happened since then).

ADD COMMENT • link 3.4 years ago by seidel 11k

0

Entering edit mode

That's an interesting point. My understanding of differential expression (DE) analysis is that you often make some kind of assumption about how RNAseq read counts are distributed (e.g. negative bionomial). In fact, I think tools like edgeR and DEseq make such assumption (Why Does Rna-Seq Read Count Fit Poisson Distribution?). Given that it's DNAseq read counts, I would think that the distribution would be vastly different from that of RNAseq reads. So I'm little bit surprised to hear that tools for RNAseq DE can handle cases like the one OP is asking.

ADD REPLY • link 3.4 years ago by sbstevenlee ▴ 480

0

Entering edit mode

I'll just ask - why would you think the distributions would be vastly different, simply based on the source molecule type? Especially after the link you posted. We have no idea (at least I don't) of what kind of experiment produced the data in the table presented by @roalva1, and that's really the key - we don't know the nature of the process generating the reads. Maybe they try to offer a clue by saying they know they can use CNV software. For me a basic assumption would be they measured some kind of process that generates reads across the genome that can be mapped to their 20k "genes" and they'd like to compare these between groups, which sounds similar to RNA Seq, or ChIP Seq, such that one expects some number of things to be unchanged between groups, and some number of things to significantly different. I suppose it would help if people would actually describe what they're doing :)

ADD REPLY • link 3.4 years ago by seidel 11k

0

Entering edit mode

Dear,

the experiment is exome dna sequencing from tumor and control. I think it is not a good solution to use edgerR, since it disregards genes with count 0, and I am very interested in them.

ADD REPLY • link 3.4 years ago by roalva1 ▴ 90