Question

Multiple sequence alignment on multiple very long sequences

0

Entering edit mode

9.0 years ago

nash.claire ▴ 510

Hi all,

I have a question regarding multiple sequence alignment.

My aim is to take the entire AR gene sequence from human, from mouse and from rat and to try and compare them to each other to identify the regions which are highly conserved. I have tried both Clustal Omega and T-coffee on the EBI server, both of which failed. I have also tried Seaview which gives me the following error

"/Applications/seaview4/seaview.app/Contents/Resources/clustalo" -v -i "/var/folders/b9/kwl_ycbx555bk91cm2fn0wj40000gp/T/seaview.hGDUqy.pir" -o "/var/folders/b9/kwl_ycbx555bk91cm2fn0wj40000gp/T/seaview.hGDUqy.out" --outfmt=fasta 
Using 4 threads
Read 3 sequences (type: DNA) from /var/folders/b9/kwl_ycbx555bk91cm2fn0wj40000gp/T/seaview.hGDUqy.pir
not more sequences (3) than cluster-size (100), turn off mBed
Calculating pairwise ktuple-distances...
Ktuple-distance calculation progress done. CPU time: 113.04u 0.13s 00:01:53.17 Elapsed: 00:01:07
Guide-tree computation done.
clustalo(93090,0xa2f07000) malloc: *** mach_vm_map(size=176128) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
libc++abi.dylib: terminating with uncaught exception of type std::bad_alloc: std::bad_alloc

I am currently trying MAFFT online but it seems to be taking forever. My best guess at the problem is that my sequences are too long. For the human, the sequence is over 180,000 bp. I have quadruple checked that the format of my files are ok. I'm using fasta format with no special characters and spaces in wrong places.

Can anyone recommend an algorithm that can handle this job?

alignment sequence • 4.8k views

ADD COMMENT • link updated 5.8 years ago by Biostar 20 • written 9.0 years ago by nash.claire ▴ 510

0

Entering edit mode

The algorithms likely will able to handle the job but you don't appear have enough RAM. You could try to find a larger server.

NCBI may have already done this for you. Can you check on homologene if your gene is present?

Edit: Looks like this must be it. Click on multiple alignments to see the alignments.

ADD REPLY • link 9.0 years ago by GenoMax 153k

0

Entering edit mode

What's wrong with using UCSC tracks of alignments of multiple species to find conserved regions? I would argue that comparing more species would give you a clearer view on what's conserved by evolution.

ADD REPLY • link 9.0 years ago by WouterDeCoster 48k

0

Entering edit mode

I have used vista, it does align long sequences. In case you want to align them http://genome.lbl.gov/vista/lagan/submit.shtml

ADD REPLY • link 9.0 years ago by microfuge ★ 2.0k

0

Entering edit mode

Have you also looked in Treefam ? The AR family is here. The multiple alignment is available under the Download tab on the left.

ADD REPLY • link 9.0 years ago by Jean-Karim Heriche 27k

score 0 · Answer 1 · 2016-08-10

0

Entering edit mode

9.0 years ago

Sej Modha 5.3k

Have you tried Mafft?

ADD COMMENT • link 9.0 years ago by Sej Modha 5.3k

score 0 · Answer 2 · 2016-08-10

Wow thanks for all your help everybody. You've pointed out some useful tools I didn't know existed so this is great. My plan was to eventually use the PhastCons tool which takes as input a multiple alignment and a phylogenetic model hence me trying to do the alignment myself. However, it seems it has already been done on Treefam and NCBI as suggested by Jean-Karime Heriche and genomax2.

Thank you so much guys!!