How to assign massive amount of protein to pfam using R
1
0
Entering edit mode
4.5 years ago

Hi! I'm doing a bacteria pan-genome research, which involves thousands of genomes. I'm trying to assign every protein in all the genome to pfam. I know there are tools like NCBI cdd database, but I don't know how to do scripted search, since you can only search 4000 proteins at one time on the website. I wonder if there is a R package to do this job, or any other convenient methods?

R pfam pan-genome • 1.3k views
ADD COMMENT
1
Entering edit mode
4.5 years ago

There's the pfam_scan.pl perl script for this. Here is a quick tutorial.

ADD COMMENT
0
Entering edit mode

Thank you for your answer. This method seems too slow for me, maybe I should cluster my proteins first.

ADD REPLY
0
Entering edit mode

Have you considered parallelizing? If you cluster the sequences then you could derive a profile HMM for each cluster and use something like HHsearch to compare these to the Pfam profiles.

ADD REPLY
0
Entering edit mode

I just realized that I can download pfam and COG infomation from IMG database, which have been assigned to proteins already. I haven't tried your method yet, but It sounds plausible. I'll accept your answer.

ADD REPLY

Login before adding your answer.

Traffic: 2365 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6