How To Judge If A Pfam Domain Is A Repeat And The Functional Similarities Of A Series Domains?
1
0
Entering edit mode
13.3 years ago
Ct586 ▴ 630

Hi, all.

I have some predicted domains using pfam listed as follows:

HEAT HMG_box HSP90 Histone Homeobox

IQ Jacalin KH1 Kelch1 Kelch2 LEA4

LRR1 LRR2 LRR_3

Here there are two problems:

1.How can I know if a domain is a repeat?

From this example, we can see 'HEAT', 'LRR_*' are all repeat expansion domains. But without looking definition and description of other domains, I can not know whether they are annotated repetitions.

So is there a list to show which domains are repetitions and others are not? Or if there are some text mining methods to do this?

2.I want to know which of these domains can be classified into one class based on their functional similarity. How can I do it?

I know Pfam Clan maybe a good way to classify domain families, but how can I get the needed files containing clans and their descendants? I have looked through the ftp of pfam, there are many files and without a README to explain them clearly. I do not know which to choose.

Are there others methods beside clan can do this?

Thank you very much!

---------------------------Update-------------------------------------------------------

Thanks neilfws for the help and direction!

Here is the answer for the second question, 'Pfam-C.gz'(ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-C.gz) .

Herepfam.sanger.ac.uk/help#tabview=tab12 is a not so complete description list of ftp site.

domain repeats annotation • 3.5k views
ADD COMMENT
1
Entering edit mode
13.3 years ago
Neilfws 49k

To answer your first question; if you download these 2 files from the FTP site:

and uncompress them, you'll see a line in the SQL file:

`type` enum('Family','Domain','Repeat','Motif') NOT NULL,

This tells you that PFAM entries are annotated with a type, one of which is "Repeat".

The "type" is given in column 9 of the corresponding text file, pfamA.txt. So you can use that file to search, given an identifier, or even create your own local MySQL database to search. You can also download corresponding files for PFAM B.

For your second question: I don't have a ready answer, but the data you need is very likely in the FTP site, if you just keep digging and examining the files. You can find clans non-programmatically by using the PFAM accession and going to the appropriate URL, e.g.:

PF07723

then choosing "clans" from the menu, where you will see (for that example):

FNIP  LRR_1  LRR_2  LRR_3  LRR_4
ADD COMMENT
0
Entering edit mode

Thank you! For the second question, I found the answer file 'Pfam-C.gz'(ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-C.gz) . Here(http://pfam.sanger.ac.uk/help#tabview=tab12) is a not so complete description list of ftp site.

ADD REPLY
0
Entering edit mode

Dear Neil, do you know if it's possible to exclude some types of PFAM entries, such as "Repeat", when using HMMER?

I want to search the PFAM HMM database with hmmscan but excluding the repeats, and I can't find any option to "grep" the non-repeats. Do I have to do it at the end of the pipeline, i.e., excluding the repeats based on the pfamA.txt.gz file from the table I get from hmmscan?

Thanks a lot

ADD REPLY
0
Entering edit mode

Don't know, sorry. As you see it's 5.3 years since I last thought about this :) hmmer itself has changed a lot in that time.

ADD REPLY

Login before adding your answer.

Traffic: 2028 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6