I'm new to RNA-seq and I'm playing around with STAR Aligner. I was changing some of the command line options on STAR and trying to determine their impact on my gene counts (counted using featureCounts command line tool). My question is, what is the best way to compare these data sets?
For example, in one trial I ran STAR with default settings and then I reran with alignIntronMin set to 100 (default is 21). So now I have two count text files...
counts_default.txt
counts_alignIntMin100.txt
I came up with a metric where I calculate the fold-difference in gene counts between sets and calculate the median fold-difference of the 500 most changed genes. This can be called something like Median-500-Most-Fold-Different-Genes metric. In this case the value is really low; 0.0000001, probably because my command line change had little effect.
However, is there a better way to assess these changes? Am I on the right track? Keep in mind I only have one RNA-seq dataset. I can't do comparisons beyond how my command line changes alter the gene counts. Any suggestions would be welcome. Thanks!
As I mentioned in my post; I only have one file. So I cannot look at differential gene expression.
You have only one file, but with it, you generated multiple expression tables. Have you tried to compare the expression tables using software for differential gene expression ?