I have a matrix with read counts data from RNA-seq analysis. Let's say I'm only interested in 10 specific genes (I want to know if they are differentially expressed), but the data is for all genes. Should I:
1) Filter the counts matrix and only proceed with the 10 genes of interest, or
2) Perform the analysis for all genes, and only filter at the very end?
The advantage of option 1 is that I perform less tests, and thus need to make less corrections for multiple testing. The advantage of option 2 is that I get a broader view of the data. How would you go about that?
Thanks!
short answer, if you decided only pick your final targets from those 10 genes, you only have the multiple testing issues of 10 genes. you can still look at the broad picture, but as long as you don't pick genes from this broader picture, you don't need to adjust your multiple testing strategy.