Hi all,
I have a question regarding multiple sequence alignment.
My aim is to take the entire AR gene sequence from human, from mouse and from rat and to try and compare them to each other to identify the regions which are highly conserved. I have tried both Clustal Omega and T-coffee on the EBI server, both of which failed. I have also tried Seaview which gives me the following error
"/Applications/seaview4/seaview.app/Contents/Resources/clustalo" -v -i "/var/folders/b9/kwl_ycbx555bk91cm2fn0wj40000gp/T/seaview.hGDUqy.pir" -o "/var/folders/b9/kwl_ycbx555bk91cm2fn0wj40000gp/T/seaview.hGDUqy.out" --outfmt=fasta
Using 4 threads
Read 3 sequences (type: DNA) from /var/folders/b9/kwl_ycbx555bk91cm2fn0wj40000gp/T/seaview.hGDUqy.pir
not more sequences (3) than cluster-size (100), turn off mBed
Calculating pairwise ktuple-distances...
Ktuple-distance calculation progress done. CPU time: 113.04u 0.13s 00:01:53.17 Elapsed: 00:01:07
Guide-tree computation done.
clustalo(93090,0xa2f07000) malloc: *** mach_vm_map(size=176128) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
libc++abi.dylib: terminating with uncaught exception of type std::bad_alloc: std::bad_alloc
I am currently trying MAFFT online but it seems to be taking forever. My best guess at the problem is that my sequences are too long. For the human, the sequence is over 180,000 bp. I have quadruple checked that the format of my files are ok. I'm using fasta format with no special characters and spaces in wrong places.
Can anyone recommend an algorithm that can handle this job?
The algorithms likely will able to handle the job but you don't appear have enough RAM. You could try to find a larger server.
NCBI may have already done this for you. Can you check on homologene if your gene is present?
Edit: Looks like this must be it. Click on multiple alignments to see the alignments.
What's wrong with using UCSC tracks of alignments of multiple species to find conserved regions? I would argue that comparing more species would give you a clearer view on what's conserved by evolution.
I have used vista, it does align long sequences. In case you want to align them http://genome.lbl.gov/vista/lagan/submit.shtml
Have you also looked in Treefam ? The AR family is here. The multiple alignment is available under the Download tab on the left.