Question

Mapping between genomic coordinates to transcriptomic coordinates

0

Entering edit mode

7.7 years ago

John Ma ▴ 310

Suppose I have a list of genomic positions and a reference GTF/ GFF3. Are there any software tools (any language is fine) that maps the position to the cDNA/ CDS position in relate to any transcript(s) that it may overlap?

For example, based on hg38 and GENCODE version 25, I would expect an input of chr1:151406413 would give the following as output (using CDS coordinates): ENST00000271715: 2622, ENST00000368863: 2337, ENST00000392723: 2463, ENST00000409503: 2595, ENST00000491586: 2490, ENST00000529669: 822 and ENST00000531094: 2436.

I know I can just throw it into ANNOVAR or VEP; but I wonder if there's a simpler way without actually predicting the consequences.

Thanks anyone for the answer in advance!

vcf gff gtf SNP • 5.2k views

ADD COMMENT • link updated 7.7 years ago by cmdcolin ★ 4.1k • written 7.7 years ago by John Ma ▴ 310

1

Entering edit mode

7.7 years ago

John Ma ▴ 310

I got myself an answer today: for Python, we may use transVar.

ADD COMMENT • link 7.7 years ago by John Ma ▴ 310

0

Entering edit mode

7.7 years ago

Alex Reynolds 36k

On the command line, one option is to download annotations and convert them to BED via convert2bed:

$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_25/gencode.v25.annotation.gff3.gz \
    | gunzip -c- \
    | convert2bed -i gff - \
    > annotations.bed

If you just want transcripts, you could filter the result with awk:

$ awk '$8=="transcript"' annotations.bed > transcripts.bed

Convert your input to a sorted BED file with awk and sort-bed:

$ awk FS=":" '{ print $1"\t"$2"\t"($2 + 1); }' input.txt | sort-bed - > input.bed

Then query the input.bed file against annotations.bed or transcripts.bed with bedmap:

$ bedmap --echo --echo-map-id --delim '\t' input.bed transcripts.bed > answer.bed

Once you have made annotations.bed or transcripts.bed, you can repeat queries with any other input.bed file you generate down the road.

ADD COMMENT • link 7.7 years ago by Alex Reynolds 36k

score 2 · Accepted Answer · 2017-05-07

2

Entering edit mode

7.7 years ago

wuchangcheng66 ▴ 30

In R, the GenomicFeatures package could be one choice.

ADD COMMENT • link 7.7 years ago by wuchangcheng66 ▴ 30

1

Entering edit mode

Eventually I got to try out this one. The important GenomicFeatures function here is mapToTranscripts:

library(GenomicFeatures)
library(rtracklayer)
gencodeTxDb <- makeTxDbFromGFF (file="gencode.annotation.gtf")
gencodeTx <- transcripts (gencodeTxDb)
names (gencodeTx) <- id2name (gencodeTxDb, "tx")
genomicLocus <- import.bed (con="genomicLocus.BED")
mappedLocus <- mapToTranscripts (x=genomicLocus, transcripts=gencodeTx)

The output is an GRanges object that is easy enough to work with.

ADD REPLY • link 7.2 years ago by John Ma ▴ 310