Question

Extract Domain Sequences From Multiple Sequences

2

Entering edit mode

13.4 years ago

Palu ▴ 290

Hi, I have 100 protein sequences with some conserved domains. I want to extract the domain sequences in a go. is it possible. Although CDD gives us the boundry of the domains but didn't give the sequences of the domain. i am a window user.

domain protein • 8.3k views

ADD COMMENT • link updated 13.4 years ago by Aleksandr Levchuk 3.2k • written 13.4 years ago by Palu ▴ 290

1

Entering edit mode

What do you have as input? Sequences (FASTA or which other format) or a list of accession numbers (Uniprot or which other database)?

ADD REPLY • link 13.4 years ago by Lyco ★ 2.3k

0

Entering edit mode

Also: do you want the consensus sequence of the conserved domain or the one in your sequences?

ADD REPLY • link 13.4 years ago by Michael Schubert ★ 7.1k

0

Entering edit mode

Are you and @Moon from Finding The Sequence Of A Domain working on the same assignment?

ADD REPLY • link updated 5.2 years ago by Ram 44k • written 13.4 years ago by Aleksandr Levchuk 3.2k

0

Entering edit mode

no we are not working on the same project :).

ADD REPLY • link 13.4 years ago by Palu ▴ 290

0

Entering edit mode

OK, Thanks! I will trust you on this. By the way, welcome to Biostars.org!

ADD REPLY • link 13.4 years ago by Aleksandr Levchuk 3.2k

score 3 · Answer 1 · 2011-07-13

If you know the domain boundary coordinates: than its very simple using input multiple sequence fasta file.

using blast "formatdb" format your fasta files.
use fastacmd with -s sequence name -L start, end :

Example: fastacmd -d refseq_protein -s NP_112245 -L 100,160

input "list_file" file with three columns "seq_id" "start" "end"

   awk '{system("fastacmd -d input_fasta.fa -s "$1" -L "$2","$3"");}' list_file

for additional information check this

score 2 · Answer 2 · 2011-07-13

2

Entering edit mode

13.4 years ago

Khader Shameer 18k

Have you tried Batch CDD search option ?

ADD COMMENT • link 13.4 years ago by Khader Shameer 18k

0

Entering edit mode

To expand on that: if you want the exact hit positions, use the rpsblast command-line tool.

ADD REPLY • link 13.4 years ago by Michael Schubert ★ 7.1k

zx8754 · Answer 3 · 2011-07-14

1

Entering edit mode

13.4 years ago

Aleksandr Levchuk 3.2k

My answer here

Finding The Sequence Of A Domain

solves your question.

ADD COMMENT • link updated 5.2 years ago by zx8754 12k • written 13.4 years ago by Aleksandr Levchuk 3.2k

0

Entering edit mode

actually I have problem with r script. Do you know any perl solution for that?

ADD REPLY • link 13.4 years ago by Palu ▴ 290

0

Entering edit mode

No, but it's very easy to install R (http://cran.cnr.berkeley.edu) also you will like R's IDE (http://rstudio.org) all are available for Linux, Mac, and Windows.

ADD REPLY • link 13.4 years ago by Aleksandr Levchuk 3.2k