Question

error in reading vcf file

0

Entering edit mode

2.5 years ago

Eliza ▴ 40

Hi, I downloaded the 21 genome data from the game and I send a job to my university cluster to read the vcf file. but I keep getting this error:

Scanning file to determine attributes.
File attributes:
  meta lines: 598
  header_line: 599
  variant count: 3483000
  column count: 8
Meta line 598 read in.
All meta lines processed.
gt matrix initialized.
Character matrix gt created.
  Character matrix gt rows: 3483000
  Character matrix gt cols: 8
  skip: 0
  nrows: 3483000
  row_num: 0
Processed variant 1928000/var/spool/slurmd/job4788377/slurm_script: line 8: 87699 Killed                  Rscript /xxxx/xxxx/xxx/filter_snp_chr_21.R
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=4788377.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.

this is my R script :

library("R.utils")
library("vcfR")
library("stringr")
library("tidyverse")
library("dplyr")
vc=read.vcfR("gnomad.genomes.r2.1.1.sites.21.vcf")
df=vc@fix
data=as.data.frame(df)
data_snp=data %>%
  filter(str_length(ALT)==1 & str_length(REF)==1)#filtering for SNPs
write.csv(data_snp,"snp_genome_21.csv")

and this is the job:

#!/bin/bash

#SBATCH --time=05:00:00
#SBATCH --ntasks=1
#SBATCH --mem=40G

module load tensorflow/2.5.0
Rscript  /xxxx/xxx/filter_snp_chr_21.R

Does it mean there is not enough memory? I gave the job 25G memory.

vcf gnomad • 969 views

ADD COMMENT • link updated 2.5 years ago by Ram 45k • written 2.5 years ago by Eliza ▴ 40

0

Entering edit mode

I gave the job 25G memory

and what's the size of the uncompressed VCF ? :-D

ADD REPLY • link updated 2.5 years ago by Ram 45k • written 2.5 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

@Pierre Lindenbaum 15G is the size as a .vcf file

ADD REPLY • link 2.5 years ago by Eliza ▴ 40

0

Entering edit mode

I downloaded the 21 genome data from the game

Do you mean chromosome 21 from gnomAD?

Your script doesn't seem to handle the resources very well. Try doing your operation line by line instead of reading the whole file into the memory.