Gene symbol list
2
0
Entering edit mode
6.5 years ago

Hi, I have many coordinates (approximatly1000), I want to know how can I find the gene symbols that are in these coordinates? Thanks, Siavash

SNP ChIP-Seq sequence next-gen • 2.1k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
2
Entering edit mode
6.5 years ago

You can use BEDOPS bedmap --echo-map-id-uniq to map IDs from a BED file of gene annotations to a list of intervals-of-interest:

$ bedmap --echo --echo-map-id-uniq coordinates.bed genes.bed > answer.bed

You will provide the sorted BED file coordinates.bed.

To generate genes.bed, this will depend on your organism and reference genome. Here's an example of how to get this file for human hg38:

$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_27/gencode.v27.annotation.gff3.gz \
    | gunzip -c - \
    | awk '$3 == "gene"' - \
    | convert2bed -i gff - \
    | awk -vOFS="\t" '{ match($0, /gene_name=(.*);level/, a); $4=a[1]; print $0; }' - \
    > genes.bed

Going back to the bedmap --echo-map-id-uniq command, the file answer.bed will have each coordinate from coordinates.bed, and the HGNC symbol names of Gencode v27 gene annotations that overlap those coordinates.

ADD COMMENT
1
Entering edit mode
6.5 years ago

intersect with bedtools.

ADD COMMENT

Login before adding your answer.

Traffic: 1596 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6