Entering edit mode
18 months ago
Diego
▴
110
Hi everyone,
Any idea on how to count protein-coding genes only per contig/scaffold/chr?
Thanks, Diego
Hi everyone,
Any idea on how to count protein-coding genes only per contig/scaffold/chr?
Thanks, Diego
If you have an annotation file, as for example, the following GTF from human:
1 havana gene 11869 14409 . + . gene_id "ENSG00000223972"; gene_version "5"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene";
1 havana transcript 11869 14409 . + . gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-202"; transcript_source "havana"; transcript_biotype "lncRNA"; tag "basic"; transcript_support_level "1";
You could get the number of protein coding genes per chromosome using:
awk '$3=="gene"' Homo_sapiens.GRCh38.98.chr.gtf | grep protein_coding | cut -f1 | sort | uniq -c
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
thanks!!. It worked perfectly!
Glad it helped :). Please consider marking the answer as "accepted" :).