Dear all,
I have data showing the interaction position of a specific RNA with gene bodies. I aim to calculate the distance of the interactions from the transcriptional start site (TSS) of the target gene body. In order to get the TSS information I use gencode annotation file (gtf) and focus on the lines labeled as "transcript".
Using the gtf file for some genes I get multiple TSS. My question is how I can choose only one TSS for every gene? In some cases, TSS for one gene differs by ~200 nts.
This is one example of a gene with multiple TSS information:
Zmym6,127077383,+
Zmym6,127077397,+
Zmym6,127077442,+
Zmym6,127078291,+
Zmym6,127086589,+
Zmym6,127104034,+
Thank you in advance.
200nt is quite a short distance. Some genes have transcripts whose start sites are kbs apart.