Extract Domain Sequences From Multiple Sequences
3
2
Entering edit mode
13.4 years ago
Palu ▴ 290

Hi, I have 100 protein sequences with some conserved domains. I want to extract the domain sequences in a go. is it possible. Although CDD gives us the boundry of the domains but didn't give the sequences of the domain. i am a window user.

domain protein • 8.3k views
ADD COMMENT
1
Entering edit mode

What do you have as input? Sequences (FASTA or which other format) or a list of accession numbers (Uniprot or which other database)?

ADD REPLY
0
Entering edit mode

Also: do you want the consensus sequence of the conserved domain or the one in your sequences?

ADD REPLY
0
Entering edit mode

Are you and @Moon from Finding The Sequence Of A Domain working on the same assignment?

ADD REPLY
0
Entering edit mode

no we are not working on the same project :).

ADD REPLY
0
Entering edit mode

OK, Thanks! I will trust you on this. By the way, welcome to Biostars.org!

ADD REPLY
3
Entering edit mode
13.4 years ago
Rm 8.3k

If you know the domain boundary coordinates: than its very simple using input multiple sequence fasta file.

  1. using blast "formatdb" format your fasta files.
  2. use fastacmd with -s sequence name -L start, end :

Example: fastacmd -d refseq_protein -s NP_112245 -L 100,160

input "list_file" file with three columns "seq_id" "start" "end"

   awk '{system("fastacmd -d input_fasta.fa -s "$1" -L "$2","$3"");}' list_file

for additional information check this

ADD COMMENT
2
Entering edit mode
13.4 years ago

Have you tried Batch CDD search option ?

ADD COMMENT
0
Entering edit mode

To expand on that: if you want the exact hit positions, use the rpsblast command-line tool.

ADD REPLY
1
Entering edit mode
13.4 years ago

My answer here

Finding The Sequence Of A Domain

solves your question.

ADD COMMENT
0
Entering edit mode

actually I have problem with r script. Do you know any perl solution for that?

ADD REPLY
0
Entering edit mode

No, but it's very easy to install R (http://cran.cnr.berkeley.edu) also you will like R's IDE (http://rstudio.org) all are available for Linux, Mac, and Windows.

ADD REPLY

Login before adding your answer.

Traffic: 1853 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6