Hi, I have 100 protein sequences with some conserved domains. I want to extract the domain sequences in a go. is it possible. Although CDD gives us the boundry of the domains but didn't give the sequences of the domain. i am a window user.
Hi, I have 100 protein sequences with some conserved domains. I want to extract the domain sequences in a go. is it possible. Although CDD gives us the boundry of the domains but didn't give the sequences of the domain. i am a window user.
If you know the domain boundary coordinates: than its very simple using input multiple sequence fasta file.
Example: fastacmd -d refseq_protein -s NP_112245 -L 100,160
input "list_file" file with three columns "seq_id" "start" "end"
awk '{system("fastacmd -d input_fasta.fa -s "$1" -L "$2","$3"");}' list_file
for additional information check this
Have you tried Batch CDD search option ?
My answer here
Finding The Sequence Of A Domain
solves your question.
No, but it's very easy to install R (http://cran.cnr.berkeley.edu) also you will like R's IDE (http://rstudio.org) all are available for Linux, Mac, and Windows.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
What do you have as input? Sequences (FASTA or which other format) or a list of accession numbers (Uniprot or which other database)?
Also: do you want the consensus sequence of the conserved domain or the one in your sequences?
Are you and @Moon from Finding The Sequence Of A Domain working on the same assignment?
no we are not working on the same project :).
OK, Thanks! I will trust you on this. By the way, welcome to Biostars.org!