Hi Guys
I have a set of RNA-seq data and so far I have prepared my data and the number of raw read counts for each gene for each sample is calculated also I have a matrix in which the columns are samples and rows are genes. now I want to filter out some of the genes to reduce the false positive rate. would you please let me know how I can do the filtering?
Actually I have tried "read count per million" and it is calculated for every gene in every sample but I don't know how to determine the best cut off value for that. (for example can I say if the number of read counts of a gene is 2 or less than 2 and it happens in at least 10 sample this gene must be removed?)
Thanks,
Behzad
Filtering is generally performed on the adjusted p-values and fold-changes. Have you used edgeR/DESeq2/etc. to calculate that yet?
@Devon: I have not done DE analysis yet. before that I want to remove some genes that are not expressed. as you know even the genes which are not expressed, have few read count.
So I want to filter out these genes.
Just do independent filtering after the fact (if you use DESeq2, this is automatic).