Question

Convert transcript coordinates to genomic coordinates

0

Entering edit mode

3.8 years ago

vkkodali_ncbi ★ 3.8k

Starting with a bunch of features in transcript coordinates along with an alignment of the transcript to the genome, is there a way to get alignments in genomic coordinates? This is very similar to a liftover task; people routinely use liftOver to map features from one genome to another but in this case, I want to map features from a transcript to genome.

For example, I have a bed file with features of interest as follows:

tx1    10    25    feat1    100    +
tx1    45    95    feat2    100    +

And I have an alignment file, say, in BAM format with tx1 aligned to chr1. Note, tx1 is a multi-exon transcript and aligns to chr1 with intronic regions. What I am trying to get to is an output bed file with my features in chromosome coordinates that look something like:

chr1    1500    1525    feat1    100    +
chr1    1945    1995    feat2    100    +

Notes:

I am flexible with input, output and alignment formats.
I would prefer a solution that does not rely on any existing annotation as both tx1 and chr1 may be arbitrary sequences that are outside the scope of the standard databases.
tx1 is multi-exonic and the features can span two or more adjacent exons, so the output should have multiple rows for such split features

alignment • 2.0k views

ADD COMMENT • link 3.8 years ago by vkkodali_ncbi ★ 3.8k

0

Entering edit mode

Do you have a reference genome in gff? If so, maybe a bed2gff and then merge this new gff to the genomic gff will get you the right coordinates. Just a guess.

ADD REPLY • link 3.8 years ago by Arsenal ▴ 160

0

Entering edit mode

Thanks, but merging gff files will not make any changes to the locations of the features. So, even if I have two gff3 files one for tx1 and another for chr1, merging the gff3 file will essentially concatenate the two files because the gff3 merging logic will not know that the tx1 and chr1 are related. This task is somewhat similar to what liftover does... instead of moving features from one set of genomic coordinates to another, I need it to move them from transcript coordinates to genomic coordinates. I will update the question so that it is clearer.

ADD REPLY • link 3.8 years ago by vkkodali_ncbi ★ 3.8k

score 2 · Answer 1 · 2021-02-14

2

Entering edit mode

3.8 years ago

rpolicastro 13k

If you don't mind using R, mapFromTranscripts from the GenomicFeatures library, or mapToAlignments from the GenomicAlignments library can probably accomplish this.

ADD COMMENT • link 3.8 years ago by rpolicastro 13k

1

Entering edit mode

Thank you! This looks like something that I can use. My R is a bit rusty but looking over the examples in the GenomicFeatures reference manual gives me some idea of where to start and how to go about it.

ADD REPLY • link 3.8 years ago by vkkodali_ncbi ★ 3.8k