Entering edit mode
2.4 years ago
qstefano
▴
20
Hello everyone, I'm trying to run featurecounts on several bam files, but after analysing some samples i get the same error:
*** Error in `/home/anaconda3/envs/samtools40/lib/R/bin/exec/R': double free or corruption (!prev): 0x00002aacc7b6b780 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x81489)[0x2aaaab230489]
/lib64/libc.so.6(fclose+0x177)[0x2aaaab21d037]
/home/anaconda3/envs/samtools40/lib/R/library/Rsubread/libs/Rsubread.so(SAM_pairer_probe_maxfp+0x4ae)[0x2aaab09d35de]
/home/anaconda3/envs/samtools40/lib/R/library/Rsubread/libs/Rsubread.so(SAM_pairer_run_once+0x182)[0x2aaab09d55f2]
/home/anaconda3/envs/samtools40/lib/R/library/Rsubread/libs/Rsubread.so(SAM_pairer_run+0x28)[0x2aaab09d89f8]
/home/anaconda3/envs/samtools40/lib/R/library/Rsubread/libs/Rsubread.so(fc_thread_wait_threads+0x11)[0x2aaab09f79e1]
/home/anaconda3/envs/samtools40/lib/R/library/Rsubread/libs/Rsubread.so(readSummary_single_file+0x204)[0x2aaab09fbe44]
/home/anaconda3/envs/samtools40/lib/R/library/Rsubread/libs/Rsubread.so(readSummary+0xde2)[0x2aaab09fcda2]
/home/anaconda3/envs/samtools40/lib/R/library/Rsubread/libs/Rsubread.so(R_child_thread_child+0xe)[0x2aaab097c9de]
/lib64/libpthread.so.0(+0x7dd5)[0x2aaaae51edd5]
/lib64/libc.so.6(clone+0x6d)[0x2aaaab2acead]
I'm running my script on HPC, using rclone for files reading and writing on google drive, the unit have been mounted as follows:
rclone mount remote: /home/gdrive/ --allow-other --buffer-size 512m --drive-chunk-size 128M --umask 002 --vfs-read-chunk-size-limit off --daemon --use-mmap
I tried changing the options of rclone without success. Do you know how to solve it? thanks
Reading/writing from google drive is likely causing this. Have you tried to download a couple of files locally?
featureCounts
is a stable program and works fine. If you are working on HPC why are you using a hacky solution like this?Yes, featurecounts works perfectly on my local machine, but I have a very large amount of data (19 TB), and these are stored on gdrive
Perhaps consider doing this in google cloud using a VM running featureCounts. Even then reading 19TB of data is going to be a problem with a remote mount. At least copying data may not be so bad in cloud, if you can work on a few files at a time. These must be hundreds of samples so you will end up with a gigantic matrix once you collate everything.
That having said, plan your analysis well and build a good pipeline. It might even make sense to run featureCounts either on chunks or on each file separately. Thing is that featureCounts only returns the result if everything finished properly, and if it crashes 99% through then all is lost while running on each file you can relatively easily build a count matrix by pasting the individual components together and resume counting if some files fail. It is big data, so some parallelization and pipeline control makes sense. I personally like Nextflow for these kinds of things.
I agree with @genomax. Generally, these types of I/O-heavy tasks should be done with the data at the location of the processing. I am even surprised HPC staff allows you to mount any external remotes. Isn't that a security vulnerability?