Hey
can u help me with getting only one representative fasta sequence per family? Is there way to simply do that?
cheers
X
Hey
can u help me with getting only one representative fasta sequence per family? Is there way to simply do that?
cheers
X
It's not trivial. You could use the sequences from the trRosetta Pfam model set, which are representative of the family (download link).
We have a method for getting representative sequences in our paper if you are comfortable with using hmmsearch:
A representative target sequence was found for each family using hmmsearch to search the UniRef90 database with the Pfam HMM and taking the closest subsequence match by E-value.
hmmemit
from the HHMer package will extract a consensus sequence for each HMM from Pfam:
hmmemit -c -o model.fasta model.hmm
Not only is it fast - can be done for the whole Pfam in under 2 minutes - but it is also objective because it gets the sequence directly from the model based on a simple majority rule. Keep in mind that consensus sequences generated this way may not exist in nature, although there will always be some real sequences that are very similar.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
thanks for idea :)