Entering edit mode
6.0 years ago
olechnwin
▴
60
I'm so confused. I can't figure out how to use faSplit to split my fasta file into 2 files. From the documentation, it seems I can do this command:
~/opt/faSplit sequence 1SQ_reads.fasta 2 1SQ_reads_
but, this generates files 1SQ_reads_0.fa, 1SQ_reads_1.fa 1SQ_reads_2.fa, and so on...
what did I do wrong? How do I split my fasta file into several files?
Ensure the
faSplit
you're using is the right faSplit. Run aman faSplit
to check the version as well as the usage document.The actual
faSplit
binary can be found here: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/Thanks for your reply. Tried
man faSplit
it came back with the 'No manual entry for faSplit' I thought that's where I downloaded it from. I'll try to re-download.type faSplit and press enter button without any input. It would print help. Copy/pasted from help:
Yes. I get that. I was wondering if the ability to print manual is because of the newer version.
What is the latest version? The one I have is this one:
This seems to be the latest version on bioconda. I installed that version and I see the help text when I run
fasplit
without arguments. ThefaSplit
binary from UCSC doesn't work for me on macOS but works fine on GNU Linux.Thanks for checking the version. So, the version on bioconda
faSplit sequence
does not split fasta into desired number of files. So instead I divided the size of my original file and usefaSplit about
to split by size and get approximately the number of files I wanted.I checked on my computer, it works fine. It does not split them into equally sized files, but it does split them into as many files as requested. My commands:
I have absolutely no idea why it doesn't work on mine. My commands:
and many more test_.fa files.
My input:
Maybe check with your sysadmin on this? Can you also post output of
uname -a
?Can you try with the fasta file I used and run the same commands and see if the output is different? I just wanna make sure your FASTA identifiers are not messing with the program (they shouldn't be, but just in case)
@RamRS, I tried to use your fasta file and it worked!
So, my FASTA identifiers are messing with the program? FYI, my fasta file was from pacbio. Should I be concerned with running faSplit on my fasta files then?
Not sure if it's the identifiers, why don't you try:
This doesn't work either:
Just to make sure that the previous run did not affect this one, you did
rm -rf test
before running thefaSplit
command, right? How big is your1SQ_reads.fasta
file?Yes. I removed the test folder before running the faSplit with sed. My 1SQ_reads.fasta file is about 40 GB.
You should not use
faSplit sequence
then, it seems to work in a fashion that doesn't really make sense. In your case it would, if it worked as it should, produce two files where one is a few kB and the other almost 40GB. Maybe tryfaSplit about
and copy over a few lines if it breaks halfway through an entry?hmm....thanks for the hint about checking the files I did
faSplit about
when I realizedfaSplit sequence
does not work. But, upon checking the result, it seems that althoughfaSplit about
seems to be working properly, it didn't!Update: made a mistake. Seems to be working. At least for the ones I checked.
The beginning of file 2 is this:
Searching for this line in original file:
Printing the previous line from original file:
It does match! At least for the ones I checked.
But, now I'm wary with using faSplit to split fasta file from pacbio.
did you try faSplit binary from http://hgdownload.soe.ucsc.edu/admin/exe/macOSX.x86_64/ ? @ RamRS
I don't think I tried that, but on macOS I used the one from bioconda and it works fine.
The faSplit binary does not work for me since it was built on a more recent OS than the one I currently has.
I downloaded the binary from https://github.com/ENCODE-DCC/kentUtils/tree/master/bin/linux.x86_64 and is working as expected (faSplit base)
function (split test.fa to 2 files):
output:
input: