Hello,
I'm using edgeR to analyze RNA-seq data and I have a question about library size adjustment. There are essentially two ways to adjust the output for the differences in library sizes between each of the RNA-seq samples. One is to only use counts that are from genes, which are in the R matrix I'm analyzing. The other is to make the library sizes the total number of uniquely mappable reads which would include counts that although are not being analyzed in the matrix, are a part of the experiment (i.e. reads that map to intergenic regions).
The difference is very small between the two amounts (less than 5% difference) and the results (logFC among significant genes) are almost identical in my case (R^2 > 0.99). I can make a case to use either version: on the one hand I'm only analyzing reads that are in the matrix (so I should use the sum of the column for each sample in the matrix), but on the other hand it seems intuitive to use the total uniquely mappable counts per library since even though they are not in the matrix, they were identified in the experiment.
I was wondering what the community thinks is most appropriate? Or does it not really matter in my case since the outcomes are so similar? Would it just be best to choose one and mention the other briefly if this were in a paper?
Thanks