How does DESeq2 handle the read-counts assigned to special counters in HTSeq-count output?
1
1
Entering edit mode
10.7 years ago
komal.rathi ★ 4.1k

Hi everyone,

I have a doubt regarding how DESeq2 handles the HTSeq counts assigned to special classes when performing normalization & differential gene expression.

Apart from actual genes, HTSeq assigns reads to the five classes below. I have an approximation of the mean of the counts assigned to each class for my dataset:

  1. no_feature (~40112886)
  2. ambiguous (~9732)
  3. too_low_aQual (0)
  4. not_aligned (0)
  5. alignment_not_unique (~4294028)

Now, I am going to use DESeq2 to normalize and get differentially expressed genes for my data. So how does DESeq2 handle these classes? Does it remove them during normalization, uses them in the normalization process or do we have to remove these rows manually before the normalization?

Thanks!

htseq-count RNA-Seq DESeq2 • 7.5k views
ADD COMMENT
3
Entering edit mode
10.7 years ago

It removes them when making the DeseqDataSet object that's used in the subsequent steps.

Edit: If you're curious, here are the relevant lines of DESeqDataSetFromHTSeqCount:

oldSpecialNames <- c( "no_feature", "ambiguous",
                   "too_low_aQual", "not_aligned",
                   "alignment_not_unique" )
# either starts with two underscores
# or is one of the old special names (htseq-count backward compatability)
specialRows <- (substr(rownames(tbl),1,2) == "__") |
    rownames(tbl) %in% oldSpecialNames
tbl <- tbl[ !specialRows, ]

Edit2: I should mention that if you use one of the other functions, such as DESeqDataSetFromMatrix or DESeqDataSet, you need remove these fields yourself.

ADD COMMENT
0
Entering edit mode

Oh! So basically it doesn't matter if you remove them manually before parsing your data to DESeq2. I am going to combine data from multiple htseq-count runs (Protein-coding genes and long-noncoding RNAs) and normalize them together. So R converts 'no_feature' to 'no_feature1', 'no_feature2' and so on. And I will be using DESeqDataSetFromMatrix() to get my counts so I will just remove these rows manually. Thanks for the help!

P.S.: I am using htseq-count version 0.5.4p5 and there is no '__' before these names in the count file, not that it matters because you are removing it anyway.

ADD REPLY

Login before adding your answer.

Traffic: 1737 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6