spades.py Error 255
1
0
Entering edit mode
2.7 years ago
Scottie ▴ 10

I am a student and am attempting to perform genome assembly using spades.py on SRR10676752. I trimmed the right and left fastq files using trimmomatic. I have tried the following commands:

spades.py -t 32 -m 64 -o results -1 SRR10676752_1.fastq.gz -2 SRR10676752_2.fastq.gz

spades.py -t 32 -m 64 -o results -1 SRR10676752_1_trimmed.fastq.gz -2 SRR10676752_2_trimmed.fastq.gz

spades.py -t 32 -m 64 --carful --phred-offset 33 -o results -1 SRR10676752_1.fastq.gz -2 SRR10676752_2.fastq.gz

spades.py -t 32 -m 64 --carful --phred-offset 64 -o results -1 SRR10676752_1.fastq.gz -2 SRR10676752_2.fastq.gz

All result in same error (255). Results from zcat SRR10676752_1_trimmed.fastq.gz | awk '{s++}END{print s/4}' and from zcat SRR10676752_2_trimmed.fastq.gz | awk '{s++}END{print s/4}' do not result in the same count. One is substantially larger than the other for both the trimmed files and untrimmed raw fastq files.

I read on a thread on GitHub to use fastq_pair, however this is a tool that our college server does not have and I am not authorized to install. We were given specific instructions to create pipeline using the tools we have learned thus far, and I cannot thing of any tool we have learned of that could fix this issue.

I am currently in the process of trying a polish of raw data using zcat SRR10676752_1.fastq.gz | head -n 40000000 >Polished_R1.fastq and zcat SRR10676752_1.fastq.gz | head -n 40000000 >Polished_R1.fastq, followed by a gzip of each Polished file and then will try spades again, but this is not the way that we were taught. We are supposed to create a bash script pipeline using this specific SRA ID, which I could do if I had an SRR fastq file to work with that cooperated with spades.py for assembly instead of this one.

Any advice from someone who knows how to handle this type of situation would be great! Only help on how to handle the error with spades, though.

Thanks in advance!

whole assembly genome spades.py • 2.5k views
ADD COMMENT
0
Entering edit mode

do not result in the same count. One is substantially larger than the other for both the trimmed files and untrimmed raw fastq files.

Did you trim the paired-end files independently? If you had trimmed the files together it would have prevented this problem from occurring.

ADD REPLY
0
Entering edit mode
2.7 years ago
Mensur Dlakic ★ 28k

I read on a thread on GitHub to use fastq_pair, however this is a tool that our college server does not have and I am not authorized to install. We were given specific instructions to create pipeline using the tools we have learned thus far, and I cannot thing of any tool we have learned of that could fix this issue.

I am not sure how do you expect us to know what tools you have, or what tools you are allowed to use. That's like asking us to give you a coffee cake recipe, with a caveat that you don't have eggs or sour cream, and possibly some other ingredients.

SPAdes doesn't work if you specify paired reads yet some of the reads are unpaired. You correctly identified that part. There is nothing stopping you from installing fastq_pair even if your server doesn't have it, as the program can be installed in your local directories where you have writing permissions. Otherwise, you will have to devise a way to read fastq headers (only the first of the four lines), extract read IDs, and somehow delete the IDs that are present only in one of the two files.

ADD COMMENT
0
Entering edit mode

Thank you sir. In our class we have covered usage of these tools:

fastq-dump, fastqc, trimmomatic, spades.py, quast, repeatMasker, agustus, geecee, compseq, etandem, einverted

I think I could code something like fastq-paired, however I did not properly use awk on the *.gz files, and when correcting I do see that the right and left files do have the same count.

I looked through the spades log. It says that 'The reads contain too many k-mers to fit into available memory. You need approx. 111.373GB of free RAM to assemble your data set'.

I had not specified k-mer size, so I tried specifying -k 127 for k-mer size in another run of spades with still the same issue. The SRR run downloaded from NCBI has13.1G bases. Am I misunderstanding how to assemble something this size?

ADD REPLY
0
Entering edit mode

You need approx. 111.373GB of free RAM to assemble your data set.

Based on that error, a very simple question, to which you didn't provide an answer, is: how much memory does your computer/server have? It is less important what the size of the dataset is. If you don't have > 111 Gb available, or can't allocate that much memory even though the server may have more, it seems like you won't be able to assemble.

One way around it is to try -k auto instead of -k 127 and the program will figure out a range of k-mer overlaps that may work with your resources.

ADD REPLY
0
Entering edit mode

Thank you sir. Our /home mount has 11T available, 22T total, and it looks as though each home/run/user has a partition of 26G. I do not have much experience with this. Is the /home mount shared between users? If not, I am puzzled as to how I would be able to finish this assignment.

I tried -k auto with the same results. I think if the -k parameter is left out, spades will default to auto.

ADD REPLY
1
Entering edit mode

111.373GB of free RAM refers to Random Access Memory, not disk space. You can find the amount of RAM on your computer with this command:

grep MemTotal /proc/meminfo

You will get a number in kB, something like 66071288 kB. To find RAM size in Gb, divide the number by 1048576, which will give you approximate size in Gb. For the above number it comes down to ~63 Gb. That will tell you if you are even close to 111 Gb that SPAdes says you need.

Not sure if all Linux systems have it, but this command will tell you the RAM size in Gb directly (it will be in row Mem and column Total):

free -h

If the above command says that you have X Gb where X<111, try to use the -m switch in SPAdes and specify ~90% of whatever X is. In the case above, it would be -m 57.

ADD REPLY
0
Entering edit mode

Hi and thank you very much for all of your help. Running command:

free -h

in our server, it provides MemTotal: 251G. So, as long as I am not running any other processes this should be sufficient to meet the ~111 RAM requirement if I am understanding correctly?

ADD REPLY
0
Entering edit mode

Correct. Are you the only person running jobs on this cluster?

ADD REPLY
0
Entering edit mode

On our server, I am the only one running processes under my UID. The server is used by other students as well. Yesterday, other than the root user, I was the only UID with active processes.

ADD REPLY
0
Entering edit mode

My professor got back with me and asked that I try to use -m 128. I tried this and it still failed. He then suggested that we do not have sufficient RAM and that I use 50% of the reads.

ADD REPLY

Login before adding your answer.

Traffic: 1686 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6