Entering edit mode
23 months ago
Eliza
▴
40
Hi, I downloaded the 21 genome data from the game and I send a job to my university cluster to read the vcf file. but I keep getting this error:
Scanning file to determine attributes.
File attributes:
meta lines: 598
header_line: 599
variant count: 3483000
column count: 8
Meta line 598 read in.
All meta lines processed.
gt matrix initialized.
Character matrix gt created.
Character matrix gt rows: 3483000
Character matrix gt cols: 8
skip: 0
nrows: 3483000
row_num: 0
Processed variant 1928000/var/spool/slurmd/job4788377/slurm_script: line 8: 87699 Killed Rscript /xxxx/xxxx/xxx/filter_snp_chr_21.R
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=4788377.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
this is my R script :
library("R.utils")
library("vcfR")
library("stringr")
library("tidyverse")
library("dplyr")
vc=read.vcfR("gnomad.genomes.r2.1.1.sites.21.vcf")
df=vc@fix
data=as.data.frame(df)
data_snp=data %>%
filter(str_length(ALT)==1 & str_length(REF)==1)#filtering for SNPs
write.csv(data_snp,"snp_genome_21.csv")
and this is the job:
#!/bin/bash
#SBATCH --time=05:00:00
#SBATCH --ntasks=1
#SBATCH --mem=40G
module load tensorflow/2.5.0
Rscript /xxxx/xxx/filter_snp_chr_21.R
Does it mean there is not enough memory? I gave the job 25G memory.
and what's the size of the uncompressed VCF ? :-D
@Pierre Lindenbaum 15G is the size as a .vcf file
Do you mean chromosome 21 from gnomAD?
Your script doesn't seem to handle the resources very well. Try doing your operation line by line instead of reading the whole file into the memory.
@ barslmn it worked for 10 G data with no problem