Hi All,
Supposed I have a .gtf file with the exons of all genes in a given genome. I would like to extract the first exon coordinate of each gene from the .gtf file. How should I get started?
Regards,
Ziyi
Hi All,
Supposed I have a .gtf file with the exons of all genes in a given genome. I would like to extract the first exon coordinate of each gene from the .gtf file. How should I get started?
Regards,
Ziyi
As your question is about getting started, I would suggest to look for code snippets or libraries that parse GTF files to get an idea about handling a GTF file.
For example:
http://www-huber.embl.de/users/anders/HTSeq/doc/tour.html#tour
https://github.com/ctokheim/PrimerSeq/blob/master/gtf.py
But a quick and dirty way would be:
curl https://raw.githubusercontent.com/roryk/DEXSeq/master/inst/python_scripts/dexseq_prepare_annotation.py | python - genes.gtf out.tmp​
grep "exonic_part_number \"001\"" out.tmp | less -S
This gives all the first exonic parts of a gene, assuming a standard gtf file format.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thank you! HTseq seems promising and I think I found some leads in the TSS Plot.
I updated my ans. accept it if it works for you.