Hello, I am a novice at bioinformatics tools and I am looking for some advice on the best way to approach this problem. Please point me in the right direction.
Basically I have 13 genomes of interest identified via literature, and I am trying to find all occurrences of a certain type of protein domain family in these genomes (conserved protein domain: pfam00502).
There will be multiple occurrences of similar sequences within each genome, as there are multiple gene variants. I need to find all of them.
Would I have to annotate the whole genome and then search for this family within the annotated genome? -How would I go about that?
I tried BLAST with a search sequence but this did not appear to return what i am looking for.
If searching for PF00502 domain by BLASTP, you may need to be careful that your hits are true positives for this domain, since the domain has variability. If your 13 genomes are curated in UniProt, and what you need to find is the all the proteins in the genome that match the domain, one alternative may be to use Uniprot's pre-computed PFAM domain mappings. For example, all proteins in UniProtKB matching that domain:
https://www.uniprot.org/uniprot/?query=database%3A%28type%3Apfam+PF00502%29&sort=score
This list could be filtered by organism. Of course this table does not include regions outside of Uniprot protein entries, and does not directly give genomic coordinates of the domain matches.