Hello everyone, I'm rather a beginner to bioinformatics with background mainly in molecular biology (wet labs).
I have a project where I must align thousands of HIV1 genomes and the tool I found perfect for the job was Mugsy, I spent few weeks understanding it and get it to work. I tested it to align 2, 5 , 10 and 50 and the results were perfect however now I must align 13K genomes.
I'm using a server with 128gb RAM and shouldn't face any technical/memory related issues however the terminal command for running mugsy requires I type all input files, typing 13K file names is quite impractical and with I tried to copy-pasta file names in notepad and paste it to the terminal it just too long for a single terminal page and cuts the line in between. I tried combining all the genomes in a single multi-fasta file however mugsy detected only the first genome.
Any ways I could input a directory rather than a single file into mugsy or any suggestion how i could kick mugsy to detect all the genomes in a multi-fasta file ? Any help would be really appreciated, I tried searching the forum for similar issues but couldn't find any, thanks in advance for any help :)
Doesn't
*.fasta
work?Worth a try. Here is what
mugsy
help says.Did you do that by providing individual file names?
It appears that
mugsy
expects genomes to be in independent files so trying to provide multi-fasta file is not likely to work (looks like mugsy will take multi-fasta files as long as the contigs are from one genome, not the case here).You may want to look at an alternate tool (e.g. like t-coffee, MUSCLE multiple sequence alignment tools).
Yes I tried individual names, typing 5, 10 or 50 is not such an issue.
I tried:
And it returned:
I'm sorry to bother again but I've been getting another error which I'm not able to bypass properly. So when try the lines of code you offer, it works perfect if there's 100 fasta files in the directory but when I try it for the entire 12.9K sequences I get the error:
ERROR : Could not parse delta file, /home/...directory.../HIV_db_001.delta
.ERROR : Could not parse delta file, /home/...directory.../HIV_db_001.filt.delta
The error repeats for all the sequences simultaneously, I'm sure its not a memory related problem because the server I'm running it on is 128gb of ram and its unlikely. Any idea what could be causing the problem ?