Genome annotation: small dataset of genomes
2
0
Entering edit mode
6.2 years ago

Hello, I am a novice at bioinformatics tools and I am looking for some advice on the best way to approach this problem. Please point me in the right direction.

Basically I have 13 genomes of interest identified via literature, and I am trying to find all occurrences of a certain type of protein domain family in these genomes (conserved protein domain: pfam00502).

There will be multiple occurrences of similar sequences within each genome, as there are multiple gene variants. I need to find all of them.

Would I have to annotate the whole genome and then search for this family within the annotated genome? -How would I go about that?

I tried BLAST with a search sequence but this did not appear to return what i am looking for.

genome gene alignment • 1.3k views
ADD COMMENT
0
Entering edit mode
6.2 years ago

UPDATE: I repeated my BLAST search - this time genome by genome - and the results that I have generated appear to be what I need. Perhaps I mis-read them initially when I performed the BLAST search on all sequences at once.

ADD COMMENT
1
Entering edit mode

If searching for PF00502 domain by BLASTP, you may need to be careful that your hits are true positives for this domain, since the domain has variability. If your 13 genomes are curated in UniProt, and what you need to find is the all the proteins in the genome that match the domain, one alternative may be to use Uniprot's pre-computed PFAM domain mappings. For example, all proteins in UniProtKB matching that domain:

https://www.uniprot.org/uniprot/?query=database%3A%28type%3Apfam+PF00502%29&sort=score

This list could be filtered by organism. Of course this table does not include regions outside of Uniprot protein entries, and does not directly give genomic coordinates of the domain matches.

ADD REPLY
0
Entering edit mode
3.6 years ago
sagnik ▴ 50

Hello,

We have developed a gene annotator called FINDER which can annotate eukaryotic genomes using short-read RNA-Seq reads and protein sequences. It is completely automated and requires no manual intervention. FINDER also runs BRAKER to incorporate predicted genes in the repertoire. You can access the paper from FINDER and the software from here GitHub.

Thank you.

ADD COMMENT

Login before adding your answer.

Traffic: 1942 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6