I am trying to find an example where an RNA-sequencing dataset was analyzed in one manner that lead to a poor false-positive rate estimation, and then analyzed in another manner that lead to a good false-positive rate estimation. Ideally, both analysis methods led to DEGs: Only in the first case, these DEGs are not as trust-worthy as in the second case because of the lower false-positive rate estimation.
I am not concerned about what package was used for the analysis or what type of RNA-sequencing dataset.
In case this is a strange question to ask... I am hoping to examine visually what DEGs "look like" when they come from an analysis that has a poor false-positive rate estimation versus a good false-positive rate estimation.
If anyone is aware of any such example code that could produce such items, I would be very grateful to be pointed in that direction! Thank you.
Thank you very much for these links and words of advice!
As I said in my original post: "I am hoping to examine visually what DEGs 'look like' when they come from an analysis that has a poor false-positive rate estimation versus a good false-positive rate estimation.". I should have added a sentence afterward to be more clear of what I am hoping to obtain in the end:
1) The read counts for the DEGs that have evidence for high false-positive rate. 2) The read counts for the DEGs that have evidence for low false-positive rate.
I enjoyed reading those two links you sent! The paper seemed promising because there are indeed big differences between the false-positives between, say, lr and limma.voom (as seen in their Table 1). However, I have been trying to obtain the DEGs for any set of parameters (say n=3 and p=10^-4). I used their code to follow syntax similarly, but am getting strange results. For instance, when I run limma.voom using their parameters, I get all of them being DEGs!
The RNAontheBENCH package was really neat to read! I started by running their deNanoString() function on the data exactly from their vignette. I did so with all option combination types (edgeR/DESeq etc. with upperquartile/TMM etc). However, almost all of them gave a pretty high AUC (all were about 0.85). I am trying to get an example of really terrible DEG calls.
So, I then tried running their deSpikein() function on the data exactly from their vignette. I again did so on all option combination types (edgeR/DESeq etc. with upperquartile/TMM etc). This time the range of AUC values in the ROC curves from the generated plots were between 0.371 and 0.893, so I was pretty excited. I am now stuck trying to obtain the information I need (listed above as (1) and (2) above).
Essentially, I simply ran these commands:
resHigh and resLow both have a "log2FC" and "p" listed for 92 ERCC spike-ins. resHigh (which had AUC=0.893) has lower "p" values than resLow (which has AUC=0.371). I am now trying to now simulate a usual experiment, only now I know that one will lead to low quality DEGs and the other will lead to high quality DEGs. So now, I just hope to obtain the rows of the DEGs from the exampleGeneLevel (15583 rows) for both the "voom"/"TMM" method and the "t"/"linear" method. This causes me two problems:
1) I do not know what the treatment groups are (if there are any) for exampleGeneLevel? To obtain what I am looking for, I am assuming maybe there are 2 treatment groups of 6 replicates in the 12-sampled exampleGeneLevel? I tried reading the original paper and looking at NCBI, but I cannot figure out which samples belong to which group.
2) If indeed there are two groups in exampleGeneLevel, it seems I may have to write this code on my own. I could use the limmaVoom vignette on the exampleGeneLevel data and make sure I use the "TMM" normalization. I am not too sure how to run the "t" method on the exampleGeneLevel data, to be honest. Either way, the problem is I could code something that is different than what the package did to calculate the AUC values; ideally, I should keep it as similar as possible. As a result, I am wondering if I might be missing something in the package that provides this code?
This became a long post, and I apologize for that. If you have any further advice, I would greatly appreciate hearing it. Thank you again for these references.