Multiple sequence alignment on multiple very long sequences
2
0
Entering edit mode
8.3 years ago
nash.claire ▴ 510

Hi all,

I have a question regarding multiple sequence alignment.

My aim is to take the entire AR gene sequence from human, from mouse and from rat and to try and compare them to each other to identify the regions which are highly conserved. I have tried both Clustal Omega and T-coffee on the EBI server, both of which failed. I have also tried Seaview which gives me the following error

"/Applications/seaview4/seaview.app/Contents/Resources/clustalo" -v -i "/var/folders/b9/kwl_ycbx555bk91cm2fn0wj40000gp/T/seaview.hGDUqy.pir" -o "/var/folders/b9/kwl_ycbx555bk91cm2fn0wj40000gp/T/seaview.hGDUqy.out" --outfmt=fasta 
Using 4 threads
Read 3 sequences (type: DNA) from /var/folders/b9/kwl_ycbx555bk91cm2fn0wj40000gp/T/seaview.hGDUqy.pir
not more sequences (3) than cluster-size (100), turn off mBed
Calculating pairwise ktuple-distances...
Ktuple-distance calculation progress done. CPU time: 113.04u 0.13s 00:01:53.17 Elapsed: 00:01:07
Guide-tree computation done.
clustalo(93090,0xa2f07000) malloc: *** mach_vm_map(size=176128) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
libc++abi.dylib: terminating with uncaught exception of type std::bad_alloc: std::bad_alloc

I am currently trying MAFFT online but it seems to be taking forever. My best guess at the problem is that my sequences are too long. For the human, the sequence is over 180,000 bp. I have quadruple checked that the format of my files are ok. I'm using fasta format with no special characters and spaces in wrong places.

Can anyone recommend an algorithm that can handle this job?

alignment sequence • 4.3k views
ADD COMMENT
0
Entering edit mode

The algorithms likely will able to handle the job but you don't appear have enough RAM. You could try to find a larger server.

NCBI may have already done this for you. Can you check on homologene if your gene is present?

Edit: Looks like this must be it. Click on multiple alignments to see the alignments.

ADD REPLY
0
Entering edit mode

What's wrong with using UCSC tracks of alignments of multiple species to find conserved regions? I would argue that comparing more species would give you a clearer view on what's conserved by evolution.

ADD REPLY
0
Entering edit mode

I have used vista, it does align long sequences. In case you want to align them http://genome.lbl.gov/vista/lagan/submit.shtml

ADD REPLY
0
Entering edit mode

Have you also looked in Treefam ? The AR family is here. The multiple alignment is available under the Download tab on the left.

ADD REPLY
0
Entering edit mode
8.3 years ago
Sej Modha 5.3k

Have you tried Mafft?

ADD COMMENT
0
Entering edit mode
8.3 years ago
nash.claire ▴ 510

Wow thanks for all your help everybody. You've pointed out some useful tools I didn't know existed so this is great. My plan was to eventually use the PhastCons tool which takes as input a multiple alignment and a phylogenetic model hence me trying to do the alignment myself. However, it seems it has already been done on Treefam and NCBI as suggested by Jean-Karime Heriche and genomax2.

Thank you so much guys!!

ADD COMMENT

Login before adding your answer.

Traffic: 2889 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6