Hi Everyone
I have a question related to previous procedures to do Differential Gene Expression (DEG) by using DESeq2. I have counts for 2 conditions each one with 3 replicates. After I did the DEG, I realized that are duplicated genes in the final result, because when I did the count with HTSeq I considered gene id. Thus the count considers different transcripts for the same gene when I performed the analysis.
For instance
Gene ID Count Gene Name
A 10 KDR
B 12 KDR
I think that I should join these counts, since they came from the same gene, but I do not have certain of this. Thus I will have 22 read count for this gene KDR in a file and I will do the analysis considering 22 reads count for this gene instead of do the DEG for A and for B separately.
My question is: Should I join the count from different transcripts that fall in the same gene to follow the DEG analysis? If no, why?
I tried to find an answer on the forums, but I did not get one so far. Sorry if this is a basic question, I just started to do this type of analysis.
Thanks in advance for your supporting.
Cross-posted: https://support.bioconductor.org/p/9147437/
Is this a problem for a significant number of genes? Do you have gene counts or transcript counts? I don't think HTSeq can generate transcript counts. If two different Gene IDs have the same gene name, (which I know happens occasionally in ensembl) I'd stick with the unique IDs all the way through, until the very end of the analysis.
Hi!
Thank you for your reply.
I think that I could get a different result since the DEG analysis will be done based on gene ID row. For now, I got different DEG result for A and for B. I think it is a problem for the DEG analysis for this gene, since I do not have much duplicates in my count table.
As for HTseq I think that this tool generates counts for transcripts since I have different gene IDs that fall in the same gene. To do the counts I used this parameters.
Thanks