How to convert transcript counts to gene counts
0
1
Entering edit mode
2.1 years ago
LDT ▴ 340

Dear all,

I have been given to analyse a dataset like this, where I have the counts for each transcript. Since I want to use DESeq I need to find a way to convert the transcript counts to gene counts. The out put comes from HOMER, and I am not sure if tximport in R supports it(?)

# A tidytable: 54,013 × 9
   transcript             chr   start   end strand Length Copies `Annotation/Divergence` counts
   <chr>                  <chr> <int> <int> <chr>   <dbl>  <int>                   <dbl>  <dbl>
 1 transcript:AT1G01010.1 1      3631  5899 +        2268      1                       0      4
 2 transcript:AT1G01020.1 1      6788  9130 -        2342      1                       0     51
 3 transcript:AT1G01020.2 1      6788  8737 -        1949      1                       0     50
 4 transcript:AT1G01020.3 1      6788  9130 -        2342      1                       0     51
 5 transcript:AT1G01020.4 1      6788  9130 -        2342      1                       0     51
 6 transcript:AT1G01020.5 1      6788  9130 -        2342      1                       0     51
 7 transcript:AT1G01020.6 1      6788  8737 -        1949      1                       0     50
 8 transcript:AT1G01030.1 1     11649 13714 -        2065      1                       0     39

Any idea or suggestion is appreciated!
Would it be just enough to sum the transcripts of each in a new column?

tximport r gene_counts RNAseq • 1.4k views
ADD COMMENT
0
Entering edit mode

Is this for an RNA-Seq analysis? Did you align to the gene transcripts or exons?

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Thank you, ATpoint, for noticing, I was not receiving help for this I had to post multiple communities. I appreciate you posted, though. You are always helpful and you care for the community

ADD REPLY
0
Entering edit mode

I have never used the tools you mentioned, but are you trying to find a "representative" transcript per gene? If so then for A. thaliana (based on your transcript IDs) I think choosing the canonical transcript would be one of the better options. In terms of "summing" over all transcripts - generally, transcripts coding for a single gene have some overlapping coding regions, but I am not sure how that would impact the expression analysis. For my analyses, I would rather choose a representative transcript and assign that transcript's count to the gene count.

ADD REPLY

Login before adding your answer.

Traffic: 2518 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6