Hi,
I have some metagenomic .fasta files that I'm trying to analyze via CARD database https://github.com/arpcard/rgi which states that .fasta or .fasta.gz as accepted as input sequence.
My lowest paired-end file is 5.6GB. I run the analysis to test it but somewhere during the analysis the computer's space memory was not enough and I think that caused the analysis to be cut (a). As result I do get the 2 output files: .json & .txt but both are empty.
I tried compressing the file with
$ gzip filename
But when using the ##.fasta.gz file the analysis is not even carried out because "its doesn't support the format" (b) I have tried now in both linux and macOS terminal and still getting the same result. Don't have a clue what I'm doing wrong, please, any advice/suggestion would be much appreciate it
Observations from the run: During the analysis with the .fasta file I can see 5 temporal files (##.fasta.temp, ##.fasta.temp.potentrialGenes, ##.fasta.temp.contigToORF.fsa, ##.fasta.temp.contig.fsa, ##.fasta.temp.contig.fsa.blastRes.xml) Some of them are really heavy ~55GB (is that normal?) .
(a).
Error: [blastp] Failed s_BlastXMLAddIteration Q(0/1
Process Process-1:4:
Traceback (most recent call last):
File "/Users/anaconda3/envs/rgi2/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/Users/anaconda3/envs/rgi2/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/Users/anaconda3/envs/rgi2/lib/python3.6/site-packages/app/Filter.py", line 116, in process_rrna
self.format_fasta()
File "/Users/anaconda3/envs/rgi2/lib/python3.6/site-packages/app/Filter.py", line 160, in format_fasta
fout.write(">{}\n{}\n".format(header, seq))
OSError: [Errno 28] No space left on device
WARNING 2020-08-15 15:30:56,939 : Exception: <class 'OSError'> -> [Errno 28] No space left on device -> model_type: homolog
WARNING 2020-08-15 15:31:14,101 : Exception: <class 'OSError'> -> [Errno 28] No space left on device -> model_type: overexpression
WARNING 2020-08-15 20:49:47,327 : Exception: <class 'xml.parsers.expat.ExpatError'> -> unclosed token: line -2047941080, column 23 -> model_type: variant
(b).
ERROR 2020-08-14 12:17:04,726 : gz
ERROR 2020-08-14 12:17:04,726 : application/gzip
WARNING 2020-08-14 12:17:04,726 : Sorry, no support for this format.
If you file is paired-end and has 5.6Gb, it is probably a fastq (not fasta) with sequencing reads. You don't show the command-line you used, but it seems to me you are trying to run sequencing reads with
rgi main
, which has--input_type contig
orinput_type protein
. You are then running out of disk space:Even if you didn't, a blast search with an 5.6Gb input file would take a very, very long time.
You can use fastq files with
rgi bwt
, which has the following warning:Thanks for your input!
I could try the analysis using .fastq files as you recommend . However, since fastq files are heavier I assumed/hadn't much hopes after seeing that disk space with fasta files uncompressed is already a problem.
Yes, this is the command line I'm trying:
I originally had myForward_sequence.fastq and myRevervse_sequence.fastq , and I merged and converted into my 5.6 GB fasta Did so as following:
Merge them: