(LDSC) - Munge Error - ValueError: could not convert string to float: OR
1
0
Entering edit mode
3.0 years ago

Hi everyone,

I am trying to munge some data for later use in ldsc and I come into this error:

/home/expcard/Projects/GWAS_SCA/GWAS_NTR/LDSC/ldsc/munge_sumstats.py \

--sumstats /home/dominicz/LDSC/SCAMILIFELINESforMETALnoSNPFinalLDSCmunge.txt
--N 18236
--chunksize 500000
--out /home/dominicz/LDSC/SCAMILIFELINESforMETALnoSNPFinalLDSC.munge.txt
--merge-alleles /home/dominicz/LDSC/w_hm3.snplist

LD Score Regression (LDSC)
Version 1.0.1
(C) 2014-2019 Brendan Bulik-Sullivan and Hilary Finucane
Broad Institute of MIT and Harvard / MIT Department of Mathematics
GNU General Public License v3

Call: ./munge_sumstats.py --out /home/dominicz/LDSC/SCAMILIFELINESforMETALnoSNPFinalLDSC.munge.txt --merge-alleles /home/dominicz/LDSC/w_hm3.snplist --chunksize 500000 --N 18236.0 --sumstats /home/dominicz/LDSC/SCAMILIFELINESforMETALnoSNPFinalLDSCmunge.txt

Interpreting column names as follows: N: Sample size A1: Allele 1, interpreted as ref allele for signed sumstat. P: p-Value A2: Allele 2, interpreted as non-ref allele for signed sumstat. SNP: Variant ID (e.g., rs number) OR: Odds ratio (1 --> no effect; above 1 --> A1 is risk increasing)

Reading list of SNPs for allele merge from /home/dominicz/LDSC/w_hm3.snplist Read 1217311 SNPs for allele merge. Reading sumstats from /home/dominicz/LDSC/SCAMILIFELINESforMETALnoSNPFinalLDSCmunge.txt into memory 500000 SNPs at a time. .

ERROR converting summary statistics:

Traceback (most recent call last): File "/home/expcard/Projects/GWAS_SCA/GWAS_NTR/LDSC/ldsc/munge_sumstats.py", line 686, in munge_sumstats dat = parse_dat(dat_gen, cname_translation, merge_alleles, log, args) File "/home/expcard/Projects/GWAS_SCA/GWAS_NTR/LDSC/ldsc/munge_sumstats.py", line 238, in parse_dat for block_num, dat in enumerate(dat_gen): File "/home/dominicz/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/io/common.py", line 93, in BaseIterator.next = lambda self: self.next() File "/home/dominicz/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/io/parsers.py", line 959, in next return self.get_chunk() File "/home/dominicz/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/io/parsers.py", line 1019, in get_chunk return self.read(nrows=size) File "/home/dominicz/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/io/parsers.py", line 982, in read ret = self._engine.read(nrows) File "/home/dominicz/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/io/parsers.py", line 1719, in read data = self._reader.read(nrows) File "pandas/_libs/parsers.pyx", line 890, in pandas._libs.parsers.TextReader.read (pandas/_libs/parsers.c:10862) File "pandas/_libs/parsers.pyx", line 924, in pandas._libs.parsers.TextReader._read_low_memory (pandas/_libs/parsers.c:11343) File "pandas/_libs/parsers.pyx", line 989, in pandas._libs.parsers.TextReader._read_rows (pandas/_libs/parsers.c:12175) File "pandas/_libs/parsers.pyx", line 1117, in pandas._libs.parsers.TextReader._convert_column_data (pandas/_libs/parsers.c:14136) File "pandas/_libs/parsers.pyx", line 1190, in pandas._libs.parsers.TextReader._convert_tokens (pandas/_libs/parsers.c:15330)

ValueError: could not convert string to float: OR

Conversion finished at Wed Nov 10 15:34:55 2021 Total time elapsed: 2.42s Traceback (most recent call last): File "/home/expcard/Projects/GWAS_SCA/GWAS_NTR/LDSC/ldsc/munge_sumstats.py", line 745, in munge_sumstats(parser.parse_args(), p=True) File "/home/expcard/Projects/GWAS_SCA/GWAS_NTR/LDSC/ldsc/munge_sumstats.py", line 686, in munge_sumstats dat = parse_dat(dat_gen, cname_translation, merge_alleles, log, args) File "/home/expcard/Projects/GWAS_SCA/GWAS_NTR/LDSC/ldsc/munge_sumstats.py", line 238, in parse_dat for block_num, dat in enumerate(dat_gen): File "/home/dominicz/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/io/common.py", line 93, in BaseIterator.next = lambda self: self.next() File "/home/dominicz/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/io/parsers.py", line 959, in next return self.get_chunk() File "/home/dominicz/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/io/parsers.py", line 1019, in get_chunk return self.read(nrows=size) File "/home/dominicz/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/io/parsers.py", line 982, in read ret = self._engine.read(nrows) File "/home/dominicz/.conda/envs/ldsc/lib/python2.7/site-packages/pandas/io/parsers.py", line 1719, in read data = self._reader.read(nrows) File "pandas/_libs/parsers.pyx", line 890, in pandas._libs.parsers.TextReader.read (pandas/_libs/parsers.c:10862) File "pandas/_libs/parsers.pyx", line 924, in pandas._libs.parsers.TextReader._read_low_memory (pandas/_libs/parsers.c:11343) File "pandas/_libs/parsers.pyx", line 989, in pandas._libs.parsers.TextReader._read_rows (pandas/_libs/parsers.c:12175) File "pandas/_libs/parsers.pyx", line 1117, in pandas._libs.parsers.TextReader._convert_column_data (pandas/_libs/parsers.c:14136) File "pandas/_libs/parsers.pyx", line 1190, in pandas._libs.parsers.TextReader._convert_tokens (pandas/_libs/parsers.c:15330)

ValueError: could not convert string to float: OR

Would anyone know how to fix this?

Thanks in advance!

ldsc • 1.9k views
ADD COMMENT
2
Entering edit mode
3.0 years ago
Sam ★ 4.8k

Check if your OR column contains any of "NA", "Null", "nan", "." etc.

You can pre-process the data by doing something in R like

library(data.table)
dat <- fread("sumstat")
dat[,OR := as.numeric(OR)]
dat <- dat[!is.na(OR)]
fwrite(dat, "newSumstat", sep="\t", na="NA", quote=F)

Should help you filter out all the problematic columns

ADD COMMENT
0
Entering edit mode

Thank you very much! This worked!

ADD REPLY

Login before adding your answer.

Traffic: 2736 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6