I want to extract a list of proteins for a given organism that have no (high confidence) predicted domains by PFAM or the like.
(Alternatively, getting a list of predictions for a list of proteins would also be good).
I know HMMER and Pfam and the like (CCD-Hit) have various tools for searching for domains, but I don't know how to work with the emailed file outputs, and I'm specifically interested in just finding which proteins DON'T have predicted domains.
Is there an easy/simple way to do this? (Even a tool with output that I can copy-paste into a text editor/excel and then filter the columns in it..)?
Thanks!
what is emailed file output? I think, after you blast against a domain database, all those sequences with no hits are considered as sequences without domains. Am I missing something?
I was working then with the HMMER and/or PFAM search results, which are returned as a plaintext email. Yuch.
That said, even with the offline tool, I don't know how to parse the command line output text properly, it just prints it onscreen.