Hi,
I'm performing data analysis of samples of which the RNA has been sequenced. I've gotten pretty significant differences in total gene counts of samples. I've added a picture of a barplot below of the total gene counts of all my samples. The y-scale displays total gene counts divided by 10^6. The two samples with the lowest gene counts have a count of 0.2x10^6, while the sample with the highest gene counts has >20x10^6 counts. Does someone know whether this difference is normal and whether it is advisable that I remove certain outlier samples?
Thanks in advance!
Rather than relying on simple gene counts you will want to check PCA plots to identify outliers. There is bound to be some variability in gene counts since the sequencing can be uneven). Analysis programs like DESeq2/edgeR will take this into account.