Question

How does DESeq2 handle the read-counts assigned to special counters in HTSeq-count output?

1

Entering edit mode

11.1 years ago

komal.rathi ★ 4.1k

Hi everyone,

I have a doubt regarding how DESeq2 handles the HTSeq counts assigned to special classes when performing normalization & differential gene expression.

Apart from actual genes, HTSeq assigns reads to the five classes below. I have an approximation of the mean of the counts assigned to each class for my dataset:

no_feature (~40112886)
ambiguous (~9732)
too_low_aQual (0)
not_aligned (0)
alignment_not_unique (~4294028)

Now, I am going to use DESeq2 to normalize and get differentially expressed genes for my data. So how does DESeq2 handle these classes? Does it remove them during normalization, uses them in the normalization process or do we have to remove these rows manually before the normalization?

Thanks!

htseq-count RNA-Seq DESeq2 • 7.7k views

ADD COMMENT • link updated 3.7 years ago by Ram 45k • written 11.1 years ago by komal.rathi ★ 4.1k

Ram · Accepted Answer · 2014-04-15

3

Entering edit mode

11.1 years ago

Devon Ryan 105k

It removes them when making the DeseqDataSet object that's used in the subsequent steps.

Edit: If you're curious, here are the relevant lines of DESeqDataSetFromHTSeqCount:

oldSpecialNames <- c( "no_feature", "ambiguous",
                   "too_low_aQual", "not_aligned",
                   "alignment_not_unique" )
# either starts with two underscores
# or is one of the old special names (htseq-count backward compatability)
specialRows <- (substr(rownames(tbl),1,2) == "__") |
    rownames(tbl) %in% oldSpecialNames
tbl <- tbl[ !specialRows, ]

Edit2: I should mention that if you use one of the other functions, such as DESeqDataSetFromMatrix or DESeqDataSet, you need remove these fields yourself.

ADD COMMENT • link updated 5.4 years ago by Ram 45k • written 11.1 years ago by Devon Ryan 105k

0

Entering edit mode

Oh! So basically it doesn't matter if you remove them manually before parsing your data to DESeq2. I am going to combine data from multiple htseq-count runs (Protein-coding genes and long-noncoding RNAs) and normalize them together. So R converts 'no_feature' to 'no_feature1', 'no_feature2' and so on. And I will be using DESeqDataSetFromMatrix() to get my counts so I will just remove these rows manually. Thanks for the help!

P.S.: I am using htseq-count version 0.5.4p5 and there is no '__' before these names in the count file, not that it matters because you are removing it anyway.

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 11.1 years ago by komal.rathi ★ 4.1k