I am trying to understand the DEXSeq output file. This is for example the first line I have in the file.
groupID featureID exonBaseMean dispersion stat pvalue padj input IP log2fold_IP_input genomicData.seqnames genomicData.start genomicData.end genomicData.width genomicData.strand countData.1 countData.2 countData.3 countData.4 countData.5 countData.6 transcripts
WBGene00006062:E003 WBGene00006062 E003 20.97392794 0.010368532 166.1550712 5.12E-38 3.90E-34 8.567627289 0.025351334 -17.52140343 I 9503439 9504354 916 + 81 61 96 0 0 0 c("F30A10.8.2", "F30A10.8.1")
I was wondering, why there is such s big difference between the input/ip values and the counts values of the samples. If i understood it correctly, these are the normalized values after the dispersion estimation. But how are they calculated. Are they log-transformed. I have tried to read the original paper, but didn't really get it.
Also from looking at the results, the overlapping elements are mostly non-coding RNAs. Would it make sense to remove them before the analysis. Meaning to remove the non-coding genes from the gtf file before mapping, or does it make no difference in terms of normalization/differential expression results.
thanks