how to annotate a genomic internval with its gene ID, sequence, seq-type
3
0
Entering edit mode
10.0 years ago
Pei ▴ 220

Hi All:

It is very excited for my very first post here. I have a simple question: Given a genomic interval, like:

chrom start end value strand
chr1    564495    564594    248    +

How to get the related info for this interval in R ?

I wish to know the 1)gene, 2)sequence and 3)seq-type (CDS or 3'UTR)

Thanks in advance!

Best wishes!

genomic-range annotation • 3.3k views
ADD COMMENT
2
Entering edit mode
10.0 years ago

You could use BEDOPS bedmap to map your sorted intervals against a BED data stream or file containing gene annotations (e.g., GENCODE):

$ wget -qO- ftp://ftp.sanger.ac.uk//pub/gencode/Gencode_human/release_18/gencode.v18.annotation.gtf.gz \
    | gunzip --stdout - \
    | gtf2bed - \
    | bedmap --echo --echo-map intervals.bed -​ \
    > answer.bed

You could filter this answer for subcategories of GENCODE annotations using grep, e.g.:

$ grep -i "utr" answer.bed > utr.bed

To run command-line tools in R, you can use system().

ADD COMMENT
0
Entering edit mode

Thank you Alex!

I am trying. Is there any R package could do this job, without downloading annotation file?

ADD REPLY
0
Entering edit mode

Yes, system() will work with the commands I showed, without downloading the annotation file. By using standard UNIX streams and piping data from one command to the next, you do not store the GTF file locally, only the mapping result between it and your intervals-of-interest, which should not be very much larger than your intervals file.

ADD REPLY
1
Entering edit mode
10.0 years ago
Emily 24k

Have you looked at biomaRt?

ADD COMMENT
0
Entering edit mode

Not yet.

I think it must be worth a try

Thank you Emily!

ADD REPLY
0
Entering edit mode

Dear Emily:

Do you know how to set the version of genome assembly in biomaRt? For example, my genomic coordinates were based on hg19. Is it possible for me to query hg19 by biomaRt?

Thanks a lot!

ADD REPLY
0
Entering edit mode

You can query our GRCh37 biomaRt with:

grch37 <- useMart(biomart="ENSEMBL_MART_ENSEMBL", host="http://grch37.ensembl.org", path="/biomart/martservice")
ADD REPLY
0
Entering edit mode

Thanks a lot!

ADD REPLY
1
Entering edit mode
10.0 years ago
EagleEye 7.6k

Option 1: GenGen (You can use exact overlap OR plus/minus x kb)

http://www.openbioinformatics.org/gengen/tutorial_scan_region.html

Option 2: Homer (This helps in identifying the nearby genes to the given genomic intervals)

http://homer.salk.edu/homer/ngs/annotation.html

Command: annotatePeaks.pl input.bed PATH_TO_GENOME/hg19.fa -gtf PATH_TO_GTF/ensembl.gtf > outputfile_annotated.txt

This should be the easiest way using just an GTF (Example: Ensembl GRCh37) annotation file and genome file (Example: hg19).

GTF: ftp://ftp.ensembl.org/pub/release-74/gtf/homo_sapiens/

Genome: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.2bit

ADD COMMENT

Login before adding your answer.

Traffic: 2446 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6