Question

Building multiple consensus sequences from multiple fasta sequences

0

Entering edit mode

5.4 years ago

miss • 0

I am new to NGS data analysis and I don't know how to overcome this problem: I have DNA-seq results from 12 different sites stored in one fasta file. Approximately every site has around 1000 reads, so I have 12000 reads in file but I don't know to which site any read belongs. I should build consensus sequences for all 12 sites without having reference sequence or knowing which reads belong to which site. Is it possible to do it and if yes, how?

next-gen-sequencing sequence alignment • 1.8k views

ADD COMMENT • link updated 9 months ago by doppelganger1030 • 0 • written 5.4 years ago by miss • 0

0

Entering edit mode

Hallo, I have similar problems. I have 96 fastq files and I performed alignment on them. On average I have about 30 fastq files aligned to the RefSeq database built. The outputs of the alignment were .fasta and .sam files. I want to generate a consensus sequence from them. How best can this be done?

ADD REPLY • link 9 months ago by doppelganger1030 • 0

score 0 · Answer 1 · 2019-07-29

0

Entering edit mode

5.4 years ago

Jean-Karim Heriche 27k

An approach that doesn't require any prior knowledge is to cluster the sequences based on all pairwise sequence similarities/distances then compute a multiple sequence alignment for each cluster. Refinements can be made based on how much extra information is available.

ADD COMMENT • link 5.4 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Thank you! Any software tool you can recommend me for this kind of clustering?

ADD REPLY • link 5.4 years ago by miss • 0

0

Entering edit mode

You'll need to identify which sequence alignment algorithm is relevant for your sequences, run the similarity computations in parallel then apply a clustering algorithm to the resulting distance or similarity matrix. All this can be done with a bit of scripting in your favourite language.

ADD REPLY • link 5.4 years ago by Jean-Karim Heriche 27k