Entering edit mode
5.3 years ago
jdavidson2019
•
0
I am trying to globally align 50 viral genome sequences about 150 kb in length against a reference gene in order to find conserved regions. I am currently running into many errors with memory, so I suspect I am using the wrong software. Does anyone have any suggestions for a software platform that may align these sequences?
Thanks!
You’re aligning whole genomes against a reference gene? Did you mean to say reference genome?
Is this multiple alignment, or multiple pairwise alignment?
You might want to try mummer or lastz
Use MAFFT; then view the results in Base-By-Base. This can highlight diffs to a reference. Disclaimer: BBB is our tool.
You are correct, it is supposed to be reference genome. And I am trying to find a tool to conduct multiple alignment, not multiple pair wise alignment. I have several lab isolate sequences, and want to find conserved regions across all the isolates.
I will look into those two programs and see if they will work. Thank you!
Lastz is the only one I can think of that would even come close...
However:
In my experience, it doesn’t produced particularly great alignments, especially with that much data. Furthermore, you may still end up needing more memory than you have (but I dont know what kind of resources you’re working with). This scale of data certainly won’t be analyse-able on a non server environment.
I think you may need to reconsider your approach.
Will do. If I independently align each viral genome globally with a reference sequences (a multiple pairwise alignment as mentioned above), and then analyse the alignments together, would this work to help Identify roughly conserved regions? That should cut down on the computing requirements. I am working on getting cluster access, but for now I need a rough idea of the conserved regions. Thank you for your help.
Alignment versus a reference should be a reasonable approach, but the quality of large alignments is always a bit questionable, so proceed with caution.
Also give D-GENIES a try.