Hi, I have many coordinates (approximatly1000), I want to know how can I find the gene symbols that are in these coordinates? Thanks, Siavash
Hi, I have many coordinates (approximatly1000), I want to know how can I find the gene symbols that are in these coordinates? Thanks, Siavash
You can use BEDOPS bedmap --echo-map-id-uniq
to map IDs from a BED file of gene annotations to a list of intervals-of-interest:
$ bedmap --echo --echo-map-id-uniq coordinates.bed genes.bed > answer.bed
You will provide the sorted BED file coordinates.bed
.
To generate genes.bed
, this will depend on your organism and reference genome. Here's an example of how to get this file for human hg38
:
$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_27/gencode.v27.annotation.gff3.gz \
| gunzip -c - \
| awk '$3 == "gene"' - \
| convert2bed -i gff - \
| awk -vOFS="\t" '{ match($0, /gene_name=(.*);level/, a); $4=a[1]; print $0; }' - \
> genes.bed
Going back to the bedmap --echo-map-id-uniq
command, the file answer.bed
will have each coordinate from coordinates.bed
, and the HGNC symbol names of Gencode v27 gene annotations that overlap those coordinates.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
See this post, it may help.
Is There An Easy Way Of Getting Gene Symbols From Genomic Coordinates?