Question

RNA-Seq Reads Count

0

Entering edit mode

2.5 years ago

PBC ▴ 10

Hi Everyone

I have a question related to previous procedures to do Differential Gene Expression (DEG) by using DESeq2. I have counts for 2 conditions each one with 3 replicates. After I did the DEG, I realized that are duplicated genes in the final result, because when I did the count with HTSeq I considered gene id. Thus the count considers different transcripts for the same gene when I performed the analysis.

For instance

    Gene ID        Count        Gene Name

    A               10            KDR
    B               12            KDR

I think that I should join these counts, since they came from the same gene, but I do not have certain of this. Thus I will have 22 read count for this gene KDR in a file and I will do the analysis considering 22 reads count for this gene instead of do the DEG for A and for B separately.

My question is: Should I join the count from different transcripts that fall in the same gene to follow the DEG analysis? If no, why?

I tried to find an answer on the forums, but I did not get one so far. Sorry if this is a basic question, I just started to do this type of analysis.

Thanks in advance for your supporting.

HTSeq RNA-SEQ DEG reads DESeq2 • 1.4k views

ADD COMMENT • link 2.5 years ago by PBC ▴ 10

0

Entering edit mode

Cross-posted: https://support.bioconductor.org/p/9147437/

ADD REPLY • link 2.5 years ago by Kevin Blighe 89k

0

Entering edit mode

Is this a problem for a significant number of genes? Do you have gene counts or transcript counts? I don't think HTSeq can generate transcript counts. If two different Gene IDs have the same gene name, (which I know happens occasionally in ensembl) I'd stick with the unique IDs all the way through, until the very end of the analysis.

ADD REPLY • link 2.5 years ago by swbarnes2 14k

0

Entering edit mode

Hi!

Thank you for your reply.

I think that I could get a different result since the DEG analysis will be done based on gene ID row. For now, I got different DEG result for A and for B. I think it is a problem for the DEG analysis for this gene, since I do not have much duplicates in my count table.

As for HTseq I think that this tool generates counts for transcripts since I have different gene IDs that fall in the same gene. To do the counts I used this parameters.

htseq-count -t exon --mode union --stranded no -i gene_id $file $gtf > ${name}_count.txt

Thanks

ADD REPLY • link 2.5 years ago by PBC ▴ 10