Question

Genome annotation: small dataset of genomes

0

Entering edit mode

6.6 years ago

gifthorse17 • 0

Hello, I am a novice at bioinformatics tools and I am looking for some advice on the best way to approach this problem. Please point me in the right direction.

Basically I have 13 genomes of interest identified via literature, and I am trying to find all occurrences of a certain type of protein domain family in these genomes (conserved protein domain: pfam00502).

There will be multiple occurrences of similar sequences within each genome, as there are multiple gene variants. I need to find all of them.

Would I have to annotate the whole genome and then search for this family within the annotated genome? -How would I go about that?

I tried BLAST with a search sequence but this did not appear to return what i am looking for.

genome gene alignment • 1.5k views

ADD COMMENT • link updated 4.0 years ago by sagnik ▴ 50 • written 6.6 years ago by gifthorse17 • 0

score 0 · Answer 1 · 2018-10-28

0

Entering edit mode

6.6 years ago

gifthorse17 • 0

UPDATE: I repeated my BLAST search - this time genome by genome - and the results that I have generated appear to be what I need. Perhaps I mis-read them initially when I performed the BLAST search on all sequences at once.

ADD COMMENT • link 6.6 years ago by gifthorse17 • 0

1

Entering edit mode

If searching for PF00502 domain by BLASTP, you may need to be careful that your hits are true positives for this domain, since the domain has variability. If your 13 genomes are curated in UniProt, and what you need to find is the all the proteins in the genome that match the domain, one alternative may be to use Uniprot's pre-computed PFAM domain mappings. For example, all proteins in UniProtKB matching that domain:

https://www.uniprot.org/uniprot/?query=database%3A%28type%3Apfam+PF00502%29&sort=score

This list could be filtered by organism. Of course this table does not include regions outside of Uniprot protein entries, and does not directly give genomic coordinates of the domain matches.

ADD REPLY • link 6.6 years ago by Ahill ★ 2.0k

score 0 · Answer 2 · 2021-05-27

Hello,

We have developed a gene annotator called FINDER which can annotate eukaryotic genomes using short-read RNA-Seq reads and protein sequences. It is completely automated and requires no manual intervention. FINDER also runs BRAKER to incorporate predicted genes in the repertoire. You can access the paper from FINDER and the software from here GitHub.

Thank you.