Finding Proteins that have NO known Domains
1
1
Entering edit mode
10.5 years ago
ddofer ▴ 30

I want to extract a list of proteins for a given organism that have no (high confidence) predicted domains by PFAM or the like.

(Alternatively, getting a list of predictions for a list of proteins would also be good).

I know HMMER and Pfam and the like (CCD-Hit) have various tools for searching for domains, but I don't know how to work with the emailed file outputs, and I'm specifically interested in just finding which proteins DON'T have predicted domains.

Is there an easy/simple way to do this? (Even a tool with output that I can copy-paste into a text editor/excel and then filter the columns in it..)?

Thanks!

sequence pfam domain batch protein • 3.0k views
ADD COMMENT
0
Entering edit mode

what is emailed file output? I think, after you blast against a domain database, all those sequences with no hits are considered as sequences without domains. Am I missing something?

ADD REPLY
0
Entering edit mode

I was working then with the HMMER and/or PFAM search results, which are returned as a plaintext email. Yuch.

That said, even with the offline tool, I don't know how to parse the command line output text properly, it just prints it onscreen.

ADD REPLY
2
Entering edit mode
10.5 years ago

You could query the UniProt Knowledgebase for proteins with no cross-references to InterPro,

active:yes not database:interpro

http://www.uniprot.org/uniprot/?query=+active%3Ayes+not+database%3Ainterpro&sort=score

ADD COMMENT
0
Entering edit mode

Interpro has many annotations though, not just domains...

(And I'm wokring on offline sequences which aren't necessarily in Uniprot; or even NCBI.

As for your approach on a database, Wouldn't i make more sense to just search for proteins with "NOT domain:*" ? Your query has proteins with annotated domains right on the first page of results :P)

ADD REPLY

Login before adding your answer.

Traffic: 1578 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6