Hello community, sorry for the dumb question, but I'm just a novice. I downloaded a file with the reads counts from a previously published paper and run DEG with edgeR after following some tutorials and reading the user guide. But I have a problem: all the tutorials I followed started from a table with raw counts composed of integers, while the table from the paper has decimal numbers. I read somewhere that edgeR performs its own normalisation so raw counts should be used, is that true? Neither the file headers nor the supplementary mention any kind of normalisation (the file is just called processed), so what am I handling? I thought processes refers to the fact that the reads where aligned and so on. Is it possible to have raw counts with decimal numbers? Is the analysis still reliable or am i working on some sort of normalisation that messes up my analysis?
Thank you very much in advance!
( This is how may reads counts appear https://imgur.com/1wwexTR )
thank you very much for the clear explanation. Let me ask you one more question please: does the same problem apply to DESeq too?
What you describe are transcript abundance estimates and not counts. These would need to be aggregated to the gene level and this in turn would produce integers again, right?
I don't know, I'm a novice so my knowledge is still severely lacking! you mean there's a way to reconvert them into raw counts?
No. I was referring to swbarnes comment. Does not apply to your situation.
RSEM can output gene level "expected_count". RSEM can split a read's gene assignment probabilisticly among multiple genes, if it can't be uniquely assigned to a single gene. So some genes will have every count belonging 100% to that gene, but some will not, and those genes will have non-integer expected counts.
DESeq2 can import RSEM output
https://support.bioconductor.org/p/94003/#94028
But again, if the OP has no integer counts at all, then that's not what s/he has.