Does anyone knows how to count the distance of the gene in the genome, and is there any tool available?
Does anyone knows how to count the distance of the gene in the genome, and is there any tool available?
If you have to deal with multiple overlapping elements, it isn't as straightforward as subtracting coordinates.
You can use a tool like closest-features
on a BED-formatted file of gene annotations.
Get genes, e.g. for human (hg38):
$ wget -qO- ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_35/gencode.v35.basic.annotation.gff3.gz \
| gunzip -c \
| convert2bed --input=gff - \
> gencode.v35.bed
For purposes of demonstration, I'll cut out details like strand and GFF3-specific keys:
$ cut -f1-5 gencode.v35.bed > gencode.v35.c1t5.bed
To calculate the distances of closest features, use the gene annotations file as both reference and map files:
$ closest-features --closest --dist --no-overlaps gencode.v35.c1t5.bed gencode.v35.c1t5.bed > answer.bed
The result will look something like this, depending on what you use as your gene annotations file:
$ head answer.bed
chr1 11868 12227 exon:ENST00000456328.2:1 .|chr1 12612 12697 exon:ENST00000450305.2:3 .|386
chr1 11868 14409 ENSG00000223972.5 .|chr1 15004 15038 exon:ENST00000488147.1:10 .|596
chr1 11868 14409 ENST00000456328.2 .|chr1 15004 15038 exon:ENST00000488147.1:10 .|596
chr1 12009 12057 exon:ENST00000450305.2:1 .|chr1 12178 12227 exon:ENST00000450305.2:2 .|122
chr1 12009 13670 ENST00000450305.2 .|chr1 14403 14501 exon:ENST00000488147.1:11 .|734
chr1 12178 12227 exon:ENST00000450305.2:2 .|chr1 12009 12057 exon:ENST00000450305.2:1 .|-122
chr1 12612 12697 exon:ENST00000450305.2:3 .|chr1 12974 13052 exon:ENST00000450305.2:4 .|278
chr1 12612 12721 exon:ENST00000456328.2:2 .|chr1 12974 13052 exon:ENST00000450305.2:4 .|254
chr1 12974 13052 exon:ENST00000450305.2:4 .|chr1 13220 13374 exon:ENST00000450305.2:5 .|169
chr1 13220 13374 exon:ENST00000450305.2:5 .|chr1 13452 13670 exon:ENST00000450305.2:6 .|79
This result is delimited by a pipe character (|
). The first field is the reference element (gene annotation). The second field is the nearest, non-overlapping gene annotation. The third field is the signed distance between reference and nearest elements. A positive value is downstream of the stop coordinate of the reference element. A negative value is upstream of the start coordinate of the reference element.
For more detail:
https://bedops.readthedocs.io/en/latest/content/reference/set-operations/closest-features.html
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
By distance, what do you mean more specifically?
Please provide a representative input example and an example of intended output.
i mean in a genome, which has many genes, so how to calculate the physical distance of two neighboring gene
By subtracting the start coordinate of second gene from end of first.