I'm running edgeR for differencial accessibility of ATAC-seq peaks, and with my samples I noticed something with the output that I was hoping to get some explanation for. So my results from edgeR with 18 total samples looks like this:
logFC logCPM LR PValue FDR
1 4.393527353 5.905829078 572.9930277 1.25E-126 9.64E-122
2 4.380895656 5.608644923 549.0927083 1.98E-121 7.62E-117
3 5.594629126 7.869247982 544.8831178 1.63E-120 4.18E-116
4 4.397254263 5.692977435 539.7046714 2.19E-119 4.20E-115
5 4.571165539 5.632301129 530.9367562 1.77E-117 2.72E-113
I would have expected p-values/FDR values for each of my 18 samples at this particular chromosome region (1-5 on the left side, I changed the chromosome number to be less ugly for the question), yet I only get one. And so my question is: Is this an issue with my code, as one of my advisors suggests, or is it that it gives a sort of mean value across all samples, thus giving only 1 value for P-value/FDR?
Thanks in advance for any insights you might be able to provide.
I guess more details on your experimental setup, and the code you used, would help. But here it seems you are just showing the DE test for 1 sample. Depending on how you ran the test, this could represent your 18 samples (but likely incorrectly), unless the 18 samples are supposed to be replicates, then this analysis may be fine.
If you share your code, and what the setup is supposed to be, then we might be able to help more specifically.
Sure, I am happy to share the code. I'll add a few comments here and there within the code:
So first I work with DESEq2. I'm working with 18 samples like I said, with 3 tissues (eye/liver/spleen). Maybe at the "DESEq ALL" step there is something I need to modify here, but it works just fine. So now we get to edgeR (see comment #2 due to character length being too long):
(Part 2)
I'm sure there's something wrong here to explain why I don't get 18 values and instead get one. I am wondering if maybe:
Might be part of it? When I look at 'lrt' I do see my 18 samples are still inside of it, under samples (a factor with 3 levels -- eye, liver, spleen). Perhaps if you want to try running this code, you can use some snippets from my original files. Here's the count matrix:
And here's the sample-info.txt:
So maybe you could copy+paste the code and make some input files based on what I pasted, assuming you want to give it a go and assuming a count matrix file this small is allowed. Thanks for the help.
You want people to help you, make it as easy as possible. Don't just dump the whole script. Pick out just the parts relevant to the issue. Your PCA and heat maps and ggplot calls are not necessary here.
That's true, my apologies for that. I'll edit this down a bit.