Starting with a bunch of features in transcript coordinates along with an alignment of the transcript to the genome, is there a way to get alignments in genomic coordinates? This is very similar to a liftover task; people routinely use liftOver
to map features from one genome to another but in this case, I want to map features from a transcript to genome.
For example, I have a bed file with features of interest as follows:
tx1 10 25 feat1 100 +
tx1 45 95 feat2 100 +
And I have an alignment file, say, in BAM format with tx1
aligned to chr1
. Note, tx1
is a multi-exon transcript and aligns to chr1
with intronic regions. What I am trying to get to is an output bed file with my features in chromosome coordinates that look something like:
chr1 1500 1525 feat1 100 +
chr1 1945 1995 feat2 100 +
Notes:
- I am flexible with input, output and alignment formats.
- I would prefer a solution that does not rely on any existing annotation as both
tx1
andchr1
may be arbitrary sequences that are outside the scope of the standard databases. tx1
is multi-exonic and the features can span two or more adjacent exons, so the output should have multiple rows for such split features
Do you have a reference genome in gff? If so, maybe a bed2gff and then merge this new gff to the genomic gff will get you the right coordinates. Just a guess.
Thanks, but merging gff files will not make any changes to the locations of the features. So, even if I have two gff3 files one for tx1 and another for chr1, merging the gff3 file will essentially concatenate the two files because the gff3 merging logic will not know that the tx1 and chr1 are related. This task is somewhat similar to what liftover does... instead of moving features from one set of genomic coordinates to another, I need it to move them from transcript coordinates to genomic coordinates. I will update the question so that it is clearer.