Easy Way To Map Cds Coordinates To Genomic Coordinates
2
2
Entering edit mode
13.7 years ago
Alper Yilmaz ▴ 100

Suppose that I have a protein domain in my GeneA, and I know coordinates of the domain within the cds sequence of GeneA. In addition, I know the genomic coordinates of GeneA (eg, in GFF format) along with mRNA, exon coordinates.

Is there a easy way, to map the protein domain coordinates to genomic exon coordinates?

I looked into Bio::Coordinate::GeneMapper but was not able figure out it.

cds mapping gff bed coordinates • 7.1k views
ADD COMMENT
3
Entering edit mode
13.7 years ago
brentp 24k

If you're not tied to perl, this is something that pygr does quite nicely.

E.g. this example or this one

Basically, you add an annotation to a sequence and then it keeps track of local and global positions and strand orientation.

There's an example here where they load data from a gff file. And there's a class specifically for protein annotations.

I believe the workflow would be something like:

  1. add the proteins and the exons each in their own annotationDB
  2. query for the protein to get a particular global location
  3. use that global location to query to get the exonic coordinate.
ADD COMMENT
0
Entering edit mode

could you show an example of how pygr does the transformation from genomic to codon space given an annotation of coding exons? I looked at the docs you linked to but it seems very sparse/opaque. there's no explanation of the translation class or an example of the global to local transformation so i'd be very interested to see a simple example.

ADD REPLY
0
Entering edit mode
4.8 years ago
Shicheng Guo ★ 9.5k

You can receive the full genomic position for all the conserved domains with jvarkit and then use bedtools to find domains for your specific gene.

git clone https://github.com/lindenb/jvarkit.git

cd jvarkit

./gradlew mapuniprot

wget  ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.xml.gz

wget ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.xml.gz

java -jar ~/hpc/tools/jvarkit/dist/mapuniprot.jar  -R ~/hpc/db/hg19/hg19.fa  -u ~/hpc/uniprot_sprot.xml.gz -k knownGene.txt.gz -o uniprot_sprot.bed
ADD COMMENT

Login before adding your answer.

Traffic: 2568 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6