counting gene distance
2
0
Entering edit mode
4.2 years ago
MEITUO ▴ 10

Does anyone knows how to count the distance of the gene in the genome, and is there any tool available?

genome • 973 views
ADD COMMENT
0
Entering edit mode

By distance, what do you mean more specifically?

ADD REPLY
0
Entering edit mode

Please provide a representative input example and an example of intended output.

ADD REPLY
0
Entering edit mode

i mean in a genome, which has many genes, so how to calculate the physical distance of two neighboring gene

ADD REPLY
0
Entering edit mode

By subtracting the start coordinate of second gene from end of first.

ADD REPLY
1
Entering edit mode
4.2 years ago

If you have to deal with multiple overlapping elements, it isn't as straightforward as subtracting coordinates.

You can use a tool like closest-features on a BED-formatted file of gene annotations.

Get genes, e.g. for human (hg38):

$ wget -qO- ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_35/gencode.v35.basic.annotation.gff3.gz \
    | gunzip -c \
    | convert2bed --input=gff - \
    > gencode.v35.bed

For purposes of demonstration, I'll cut out details like strand and GFF3-specific keys:

$ cut -f1-5 gencode.v35.bed > gencode.v35.c1t5.bed

To calculate the distances of closest features, use the gene annotations file as both reference and map files:

$ closest-features --closest --dist --no-overlaps  gencode.v35.c1t5.bed gencode.v35.c1t5.bed > answer.bed

The result will look something like this, depending on what you use as your gene annotations file:

$ head answer.bed
chr1    11868   12227   exon:ENST00000456328.2:1    .|chr1  12612   12697   exon:ENST00000450305.2:3    .|386
chr1    11868   14409   ENSG00000223972.5   .|chr1  15004   15038   exon:ENST00000488147.1:10   .|596
chr1    11868   14409   ENST00000456328.2   .|chr1  15004   15038   exon:ENST00000488147.1:10   .|596
chr1    12009   12057   exon:ENST00000450305.2:1    .|chr1  12178   12227   exon:ENST00000450305.2:2    .|122
chr1    12009   13670   ENST00000450305.2   .|chr1  14403   14501   exon:ENST00000488147.1:11   .|734
chr1    12178   12227   exon:ENST00000450305.2:2    .|chr1  12009   12057   exon:ENST00000450305.2:1    .|-122
chr1    12612   12697   exon:ENST00000450305.2:3    .|chr1  12974   13052   exon:ENST00000450305.2:4    .|278
chr1    12612   12721   exon:ENST00000456328.2:2    .|chr1  12974   13052   exon:ENST00000450305.2:4    .|254
chr1    12974   13052   exon:ENST00000450305.2:4    .|chr1  13220   13374   exon:ENST00000450305.2:5    .|169
chr1    13220   13374   exon:ENST00000450305.2:5    .|chr1  13452   13670   exon:ENST00000450305.2:6    .|79

This result is delimited by a pipe character (|). The first field is the reference element (gene annotation). The second field is the nearest, non-overlapping gene annotation. The third field is the signed distance between reference and nearest elements. A positive value is downstream of the stop coordinate of the reference element. A negative value is upstream of the start coordinate of the reference element.

For more detail:

https://bedops.readthedocs.io/en/latest/content/reference/set-operations/closest-features.html

ADD COMMENT

Login before adding your answer.

Traffic: 1869 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6