I have received this message recently while using makeblastdb for rmblast in RepeatMasker, and it is a real head-scratcher for me.
After no errors and completely running through all cycles, RepeatMasker finishes but there are no output files. The only trace of the analysis is the rmblastdb.log file in the RepeatMasker/Libraries directory which reads:
Building a new DB, current time: 05/29/2014 10:54:02
New DB name: /home/mtollis/RepeatMasker/Libraries/20140131/anolis/specieslib
New DB title: /home/mtollis/RepeatMasker/Libraries/20140131/anolis/specieslib
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: First data line in seq is about 100% ambiguous nucleotides (shouldn't be over 40%)
Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: First data line in seq is about 100% ambiguous nucleotides (shouldn't be over 40%)
Adding sequences from FASTA; added 776 sequences in 0.108892 seconds.
Perhaps the makeblastdb "error" is harmless and maybe it is merely coincidental that my analysis fails. I don't see how either true ambiguities or line endings are the problem, as my database is hardly novel: I am using the RepBase update and the -species command. the command appears to work, as it creates the species specific library as well as the general library in the RepeatMasker/Libraries directory.
Does anyone know why RepeatMasker would run without throwing any errors and then leave no output files whatsoever?
<deleted>
In your fasta files, do your headers look ok? They all should start with ">" and header name and on the next line, true sequences should start. I can imagine these errors for the sequences without proper headers (just a guess though).
It is hard to diagnose the issue without seeing the exact commands. I realize this is an old post now, but if you can provide the command used, and some information about the data, it would likely be helpful for others. And, it's always nice to answer questions and see things resolved.
Here is the command I used:
And this is an error message I found in the standard output.
Also, the data is a vertebrate-sized genome with hundreds of thousands of scaffolds. However, I have had RM work on these kinds of datasets with no problems before.