Dear all, I am having trouble with an exercise. I should create a bash script and calculate the percentages of exons, introns and intergenic regions from mice and Human . However, Im stocked Maybe you can help me.
wget ftp://ftp.ensembl.org/pub/release-108/gtf/homo_sapiens/Homo_sapiens.GRCh38.108.gtf.gz # Download the compressed GTF file for HUMAN
gunzip Homo_sapiens.GRCh38.108.gtf.gz
grep exon Homo_sapiens.GRCh38.108.gtf | cut -f1,4,5,7 | wc -l
grep intron Homo_sapiens.GRCh38.108.gtf | cut -f1,4,5,7 | wc -l
grep inter Homo_sapiens.GRCh38.108.gtf | cut -f1,4,5,7 | wc -l
Problem: I do not get any output in the last command (inter for intergenic regions) and I do not know how to calculate the chromosome size to calculate the percentage.
Does anyone have a suggestion, I also tried bedtools with no outcome, perhaps bc im using it wrong.
Thanks
Why are the sum of all exons so much longer than CDS?