Hi, all.
I have some predicted domains using pfam listed as follows:
HEAT HMG_box HSP90 Histone Homeobox
IQ Jacalin KH1 Kelch1 Kelch2 LEA4
LRR1 LRR2 LRR_3
Here there are two problems:
1.How can I know if a domain is a repeat?
From this example, we can see 'HEAT', 'LRR_*' are all repeat expansion domains. But without looking definition and description of other domains, I can not know whether they are annotated repetitions.
So is there a list to show which domains are repetitions and others are not? Or if there are some text mining methods to do this?
2.I want to know which of these domains can be classified into one class based on their functional similarity. How can I do it?
I know Pfam Clan maybe a good way to classify domain families, but how can I get the needed files containing clans and their descendants? I have looked through the ftp of pfam, there are many files and without a README to explain them clearly. I do not know which to choose.
Are there others methods beside clan can do this?
Thank you very much!
---------------------------Update-------------------------------------------------------
Thanks neilfws for the help and direction!
Here is the answer for the second question, 'Pfam-C.gz'(ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-C.gz) .
Herepfam.sanger.ac.uk/help#tabview=tab12 is a not so complete description list of ftp site.
Thank you! For the second question, I found the answer file 'Pfam-C.gz'(ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-C.gz) . Here(http://pfam.sanger.ac.uk/help#tabview=tab12) is a not so complete description list of ftp site.
Dear Neil, do you know if it's possible to exclude some types of PFAM entries, such as "Repeat", when using HMMER?
I want to search the PFAM HMM database with hmmscan but excluding the repeats, and I can't find any option to "grep" the non-repeats. Do I have to do it at the end of the pipeline, i.e., excluding the repeats based on the pfamA.txt.gz file from the table I get from hmmscan?
Thanks a lot
Don't know, sorry. As you see it's 5.3 years since I last thought about this :) hmmer itself has changed a lot in that time.