Question

How to convert transcript counts to gene counts

1

Entering edit mode

2.6 years ago

LDT ▴ 340

Dear all,

I have been given to analyse a dataset like this, where I have the counts for each transcript. Since I want to use DESeq I need to find a way to convert the transcript counts to gene counts. The out put comes from HOMER, and I am not sure if tximport in R supports it(?)

# A tidytable: 54,013 × 9
   transcript             chr   start   end strand Length Copies `Annotation/Divergence` counts
   <chr>                  <chr> <int> <int> <chr>   <dbl>  <int>                   <dbl>  <dbl>
 1 transcript:AT1G01010.1 1      3631  5899 +        2268      1                       0      4
 2 transcript:AT1G01020.1 1      6788  9130 -        2342      1                       0     51
 3 transcript:AT1G01020.2 1      6788  8737 -        1949      1                       0     50
 4 transcript:AT1G01020.3 1      6788  9130 -        2342      1                       0     51
 5 transcript:AT1G01020.4 1      6788  9130 -        2342      1                       0     51
 6 transcript:AT1G01020.5 1      6788  9130 -        2342      1                       0     51
 7 transcript:AT1G01020.6 1      6788  8737 -        1949      1                       0     50
 8 transcript:AT1G01030.1 1     11649 13714 -        2065      1                       0     39

Any idea or suggestion is appreciated!
Would it be just enough to sum the transcripts of each in a new column?

tximport r gene_counts RNAseq • 1.6k views

ADD COMMENT • link updated 2.6 years ago by manaswwm ▴ 570 • written 2.6 years ago by LDT ▴ 340

0

Entering edit mode

Is this for an RNA-Seq analysis? Did you align to the gene transcripts or exons?

ADD REPLY • link 2.6 years ago by snowpin • 0

0

Entering edit mode

crossposted https://support.bioconductor.org/p/9147256/

ADD REPLY • link 2.6 years ago by ATpoint 88k

0

Entering edit mode

Thank you, ATpoint, for noticing, I was not receiving help for this I had to post multiple communities. I appreciate you posted, though. You are always helpful and you care for the community

ADD REPLY • link 2.6 years ago by LDT ▴ 340

0

Entering edit mode

I have never used the tools you mentioned, but are you trying to find a "representative" transcript per gene? If so then for A. thaliana (based on your transcript IDs) I think choosing the canonical transcript would be one of the better options. In terms of "summing" over all transcripts - generally, transcripts coding for a single gene have some overlapping coding regions, but I am not sure how that would impact the expression analysis. For my analyses, I would rather choose a representative transcript and assign that transcript's count to the gene count.

ADD REPLY • link 2.6 years ago by manaswwm ▴ 570