Hi everyone,
I have a doubt regarding how DESeq2 handles the HTSeq counts assigned to special classes when performing normalization & differential gene expression.
Apart from actual genes, HTSeq assigns reads to the five classes below. I have an approximation of the mean of the counts assigned to each class for my dataset:
no_feature
(~40112886)ambiguous
(~9732)too_low_aQual
(0)not_aligned
(0)alignment_not_unique
(~4294028)
Now, I am going to use DESeq2 to normalize and get differentially expressed genes for my data. So how does DESeq2 handle these classes? Does it remove them during normalization, uses them in the normalization process or do we have to remove these rows manually before the normalization?
Thanks!
Oh! So basically it doesn't matter if you remove them manually before parsing your data to DESeq2. I am going to combine data from multiple htseq-count runs (Protein-coding genes and long-noncoding RNAs) and normalize them together. So R converts 'no_feature' to 'no_feature1', 'no_feature2' and so on. And I will be using DESeqDataSetFromMatrix() to get my counts so I will just remove these rows manually. Thanks for the help!
P.S.: I am using htseq-count version 0.5.4p5 and there is no '__' before these names in the count file, not that it matters because you are removing it anyway.