Entering edit mode
6.1 years ago
kiomix106
▴
10
I am currently using files gff, gff3, gtf ... I work with commands by terminal ... mainly awk and tools like bedtools
I am currently using files gff, gff3, gtf ... I work with commands by terminal ... mainly awk and tools like bedtools
Use the UCSC Kent Utilities toolkit. For example:
$ fetchChromSizes hg38 > hg38.chromsizes
Or to build a sorted BED file without non-nuclear chromosomes:
$ fetchChromSizes hg38 \
| awk -vOFS="\t" '{ print $1, "0", $2; }' \
| egrep -v '_' \
| sort-bed - \
> hg38.bed
Whether you use a chromsizes or BED or other formatted file depends on what you're doing with it, but a little taco-bell programming can get it into the form you need.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
One of the solutions here should suffice: Easiest Way To Obtain Chromosome Length?
Are you referring to your own assembled genome or one of the pre-existing genomes out there?
my question is if I can get the size of a chromosome from a notation file or a gff or gff3 or gtf? or just from a page that has the information about any genome?
You can't get size of a chromosome from a GFF v.1 or 2 (amended based on @jrj.healey's point below)/GTF file. There is no provision in the two file formats to encode information about chromosome size.You may be able to get an approximation (f you consider the chromosome to start at base 1 and use the end interval base pair of last feature that is encoded for that chromosome).
I don't know what you mean by a notation file. Can you clarify?
build.chrome.Sizes
file available from UCSC genome data download folders will have chromosome sizes. Example file for GRCh38 human build.Not necessarily the case. GFF(3) can contain a (multi)fasta attached to the end of the file after the
## FASTA
line. If these were complete chromosomes, you could theoretically get that information from a GFF, but it wouldn’t be especially easy.Thanks for clarifying that.
Even if the file is in GFF3 format containing full chromosome sequences, the information about chromosome sizes would not be readily available for direct parsing without additional processing.