Hi I want to count the number of transcripts per gene in GTF file, using python 2 on Ubuntu 18.04. Could you help me how to count how many transcripts per gene in GTF file? Your help is much appreciated!
I am using a public gtf.gz file from ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_33/gencode.v33.annotation.gtf.gz and the output gtf file briefly looks like below, but it's truncated when I copied it from Ubuntu. Each row has more data including gene_type, gene_name, transcript_type, etc.
##description: evidence-based annotation of the human genome (GRCh38), version 33 (Ensembl 99)
##provider: GENCODE
##contact: gencode-help@ebi.ac.uk
##format: gtf
##date: 2019-12-13
chr1 HAVANA gene 11869 14409 . + . gene_id "ENSG00000223972.5"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; lev
chr1 HAVANA transcript 11869 14409 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unproc
chr1 HAVANA exon 11869 12227 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_ps
chr1 HAVANA exon 12613 12721 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_ps
chr1 HAVANA exon 13221 14409 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_ps
chr1 HAVANA transcript 12010 13670 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unproc
chr1 HAVANA exon 12010 12057 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_ps
chr1 HAVANA exon 12179 12227 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_ps
chr1 HAVANA exon 12613 12697 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_ps
chr1 HAVANA exon 12975 13052 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_ps
chr1 HAVANA exon 13221 13374 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_ps
chr1 HAVANA exon 13453 13670 . + . gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_ps
chr1 HAVANA gene 14404 29570 . - . gene_id "ENSG00000227232.5"; gene_type "unprocessed_pseudogene"; gene_name "WASH7P"; level 2; hgnc_id
chr1 HAVANA transcript 14404 29570 . - . gene_id "ENSG00000227232.5"; transcript_id "ENST00000488147.1"; gene_type "unprocessed_pseudo
chr1 HAVANA exon 29534 29570 . - . gene_id "ENSG00000227232.5"; transcript_id "ENST00000488147.1"; gene_type "unprocessed_pseudogene"; g
Please do not delete posts once they have received a comment or an answer. If an answer has solved your problem please accept it by using the green check mark to provide closure to this thread.