HI Biostars,
Currently, I am using metasv to merge SVs from the outputs of BreankDancer, CNVNATOR, and Pindel for a human genome. I was wondering if there are some tricks that I could accelerate the computational time? I also posted the same question[issue #134] to the author on metasv GitHub, but I am not sure if I can receive any reply from the developer. Any suggestions will be appreciated.
I downloaded metasv from anaconda by using the command below:
conda install -c bioconda metasv
The version of metasv:
[ksux 18:11:36 ksux_SVE]$ run_metasv.py --version
run_metasv.py 0.5.4
I performed the run_metasv.py on the example files without any issue, so I moved to my own data. The running time of metasv on our HPC is over 5 days now. Here I listed my bash command.
#!/bin/bash
#SBATCH --qos=long
#SBATCH --time=7-00:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=20
#SBATCH --mem=64G
module load anaconda/2.5.0 bedtools/2.27.1
module load gcc/4.8.2
module load cmake/3.0.2 ROOT/5.34.36
export CONDA_ENVS_PATH=/lustre/project/ksux_SVE
unset PYTHONPATH
source activate SVE
metaSV_ref=Homo_sapiens_assembly38.fasta
breakdancer_our=/data/BreakDancer_out/Subject_ID.sv.tbl
cnvnator_call=/data/CNVnator_out/Subject_ID.cnv.xls
pindel_out=/data/pindel_out/Sample_dir/Subject_ID/*
sample_idSubject_ID_tbl
alignments_bam=/data/Subject_ID.bam
spades_exe=/ksux_SVE/SVE/bin/spades.py
age_align_exe=/ksux_SVE/SVE/bin/age_align
threads=20
work=/data/metaSV_work2
OUTDIR=/data/metaSV_out2
insert_size_mean=260.04
insert_size_sd=56.34
metaSV_svs_to_assemble={'DEL','INS','INV','DUP'}
run_metasv.py --reference $metaSV_ref
--breakdancer_native $breakdancer_our
--cnvnator_native $cnvnator_call
--pindel_native $pindel_out
--sample $sample_id
--bam $alignments_bam
--spades $spades_exe
--age $age_align_exe
--num_threads $threads
--workdir $work
--outdir $OUTDIR
--isize_mean $insert_size_mean
--isize_sd $insert_size_sd
I didn't find any issues in the log file so far but the running time is over than I expected.
Last, thanks for reading this post.
Update: The author suggested to turn off the assemble function in the metasv program, and the running time was reduced to ~10 minutes for a subject. If you want to use the assemble function, that can take quite some time depending on the data. In my case, I have 2x 350bp reads (21x) for my bam file, and the overall computational time will be over 7 days. I like to keep it running, but our server allows the maximum computational time in one week only.
Out of curiosity, which platform generates PE 350bp reads? Even the MiSeq is capped at 2x300.