How to choose one TSS for every gene using Gencode annotation file?
1
0
Entering edit mode
2.3 years ago
Apex92 ▴ 320

Dear all,

I have data showing the interaction position of a specific RNA with gene bodies. I aim to calculate the distance of the interactions from the transcriptional start site (TSS) of the target gene body. In order to get the TSS information I use gencode annotation file (gtf) and focus on the lines labeled as "transcript".

Using the gtf file for some genes I get multiple TSS. My question is how I can choose only one TSS for every gene? In some cases, TSS for one gene differs by ~200 nts.

This is one example of a gene with multiple TSS information:

Zmym6,127077383,+
Zmym6,127077397,+
Zmym6,127077442,+
Zmym6,127078291,+
Zmym6,127086589,+
Zmym6,127104034,+

Thank you in advance.

TSS transcription sequencing RNA-seq • 1.4k views
ADD COMMENT
1
Entering edit mode

200nt is quite a short distance. Some genes have transcripts whose start sites are kbs apart.

ADD REPLY
2
Entering edit mode
2.3 years ago
ATpoint 86k

There is no general answer for this. It is different transcripts per gene. You could go for the longest, or the shortest, or the one that is listed in the APPRIS database as the principal one (for protein-coding genes), or the one with highest expression. It depends, and yes it is annoying, but that is biology. It should be the one that is relevant to your story.

ADD COMMENT
0
Entering edit mode

ATpoint thanks for your input. I decided to go with a random TSS for every gene because for my purpose of analysis I do not need a very high resolution. But probably for someone with a different purpose choosing an exact TSS will be necessary.

ADD REPLY

Login before adding your answer.

Traffic: 2348 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6