Question

Tool/Script To Get Consensus Sequences

0

Entering edit mode

12.1 years ago

upendrakumar.devisetty ▴ 400

I have three fasta files and basically all i want is to get is the consensus or union of three fasta files. I mean, i want to pull out all the sequences that are common among three files and put them in a separate fasta file. The three fasta files were generated differently and have different headers but the sequences are identical among three files. Is there a tool or script that does this?

msa • 5.6k views

ADD COMMENT • link updated 12.1 years ago by Damian Kao 16k • written 12.1 years ago by upendrakumar.devisetty ▴ 400

1

Entering edit mode

If there is no existing tool/script, this is a fairly basic bioinformatics programming task which you should learn how to do.

ADD REPLY • link 12.1 years ago by Neilfws 49k

0

Entering edit mode

I do understand it is basic bioinformatics programming but i thought if there is something already written then i can just use it rather than trying to write by myself

ADD REPLY • link 12.1 years ago by upendrakumar.devisetty ▴ 400

0

Entering edit mode

How large are the fasta files?

ADD REPLY • link 12.1 years ago by Damian Kao 16k

0

Entering edit mode

The files are not more than 100MB

ADD REPLY • link 12.1 years ago by upendrakumar.devisetty ▴ 400

0

Entering edit mode

search for "multiple sequence alignment" in a search engine of your choice or encyclopedia to get a starting point.

ADD REPLY • link 12.1 years ago by Michael 55k

0

Entering edit mode

Your question title and use of the term "consensus sequence" is misleading, it is reserved for alignments; it seems that you are looking to find exactly identical sequences instead. Please take at least one quarter of the time it will take us to answer your question, to put a proper example, and to explain what you already have tried.

ADD REPLY • link 12.1 years ago by Michael 55k

score 0 · Answer 1 · 2012-10-15

Sorry if i didn't explained my question well. Here is what i wanted...

I have three fasta files

fasta1

seq1 ATGATG seq2 GATAGATA seq3 TGGTGG

fasta2

m1 GATAGATA m2 TGGTGG m3 AGGAGG

fasta3

seq1 gene1 ATGATG seq3 gene3 TGGTGG seq4 gene4 AGTGTG

And what i wanted is basically a final fasta file containing the below

final_fasta

seq3 TGGTGG

As you can see this sequence is represented in all fasta files but the header i got is from fasa1

I am currently using a lengthy process to achieve this.

First blast fasta1 to fasta2 and take all the hits and use those hits to blasta fasta3.

Thought it should work it is lengthy and i though if somebody has done something like this before i can use their script/tools

score 0 · Answer 2 · 2012-10-16

0

Entering edit mode

12.1 years ago

Damian Kao 16k

use CDHIT: http://weizhong-lab.ucsd.edu/cd-hit/

ADD COMMENT • link 12.1 years ago by Damian Kao 16k