Entering edit mode
2.1 years ago
LDT
▴
340
Dear all,
I have been given to analyse a dataset like this, where I have the counts for each transcript. Since I want to use DESeq
I need to find a way to convert the transcript counts to gene counts. The out put comes from HOMER, and I am not sure if tximport
in R supports it(?)
# A tidytable: 54,013 × 9
transcript chr start end strand Length Copies `Annotation/Divergence` counts
<chr> <chr> <int> <int> <chr> <dbl> <int> <dbl> <dbl>
1 transcript:AT1G01010.1 1 3631 5899 + 2268 1 0 4
2 transcript:AT1G01020.1 1 6788 9130 - 2342 1 0 51
3 transcript:AT1G01020.2 1 6788 8737 - 1949 1 0 50
4 transcript:AT1G01020.3 1 6788 9130 - 2342 1 0 51
5 transcript:AT1G01020.4 1 6788 9130 - 2342 1 0 51
6 transcript:AT1G01020.5 1 6788 9130 - 2342 1 0 51
7 transcript:AT1G01020.6 1 6788 8737 - 1949 1 0 50
8 transcript:AT1G01030.1 1 11649 13714 - 2065 1 0 39
Any idea or suggestion is appreciated!
Would it be just enough to sum the transcripts of each in a new column?
Is this for an RNA-Seq analysis? Did you align to the gene transcripts or exons?
crossposted https://support.bioconductor.org/p/9147256/
Thank you, ATpoint, for noticing, I was not receiving help for this I had to post multiple communities. I appreciate you posted, though. You are always helpful and you care for the community
I have never used the tools you mentioned, but are you trying to find a "representative" transcript per gene? If so then for A. thaliana (based on your transcript IDs) I think choosing the canonical transcript would be one of the better options. In terms of "summing" over all transcripts - generally, transcripts coding for a single gene have some overlapping coding regions, but I am not sure how that would impact the expression analysis. For my analyses, I would rather choose a representative transcript and assign that transcript's count to the gene count.