Visualize and annotate protein domains within a genomic context, given complement
1
0
Entering edit mode
6 weeks ago
Madde ▴ 20

I have a gene of interest which is in genomic location 1374721-1384266 in my genome. However, I know this gene is on the reverse strand, as indicated by "-" in the gff file.

AP012332.1      Prodigal:2.6    CDS     1374718 1384266 .       -       0       ID=DPDCJFFM_01065;product=hypothetical protein

I am trying to annotate the protein domains within this gene, while providing the correct genomic nucleotide position as well as the true amino acid position. Therefore, position 1 in the amino acid is the start of the protein, which, because this gene is on the reverse strand, would be the last or end of the gene's genomic nucleotide position. I am unsure about what the nucleotide positions are.

Description    Start (aa)   End (aa)   Start (nt)   End (nt)
Acyl transferase    13  335    1374760(?)   1375726(?)

I took the gene from ncbi here: https://ncbi.nlm.nih.gov/protein/757812890 and downloaded the nucleotide and amino acid sequence for input into NCBI's "conserved domains" tool.

Is the sequence from NCBI in the reverse complement orientation? If so, would amino acid #13 - 335 correspond to a different start and end nucleotide position?

What other tools can I use to figure out this problem?

complement protein genomics • 377 views
ADD COMMENT
2
Entering edit mode
6 weeks ago
cmdcolin ★ 4.0k

when it says you have, in your above example, an Acyl transferase starting at position 13, the position of that is calculated counting 13*3(bases per codon) from the end position of your feature (because the amino acid is transcribed from the reverse strand, going "right to left" so "end to start"), so you get something like this:

  • the Acyl transferase domain starts, on the genome, at ~1384266-13*3
  • the Acyl transferase ends, on the genome, at ~1384266-335*3

I might have an off-by-one error in that calculation but that's the general idea

I created a tool that can help you map between genome and protein coordinate systems here https://github.com/cmdcolin/g2p_mapper_cli

another probably more common way to do it is with something like TxDB https://bioconductor.org/packages/devel/bioc/vignettes/GenomicFeatures/inst/doc/GenomicFeatures.html

ADD COMMENT

Login before adding your answer.

Traffic: 1780 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6