Question

R : Error in scan() too many items

1

Entering edit mode

7.1 years ago

shreyajha ▴ 30

I have to create an object of class haplohh from the function data2haplohh:

test<-data2haplohh(hap_file="test.hap", map_file="test.map.inp", haplotype.in.columns=TRUE)
Map file seems OK: 1054416  SNPs declared for chromosome 1
Haplotype are in columns with no header

Error in scan(hap_file, what = "character", quiet = TRUE) : too many items

So the *.hap file has 1054416 lines and 5008 columns(approx 10 GB size), *.map.inp file has 1054416 lines and 5 columns(15 MB size). My systems working RAM memory is 64 GB and 8TB is the storage space

I have an option of chopping up the files but, in this case It would create errors in my calculation. So I need to load the input files as a whole.

Please tell me how to solve this!

R rehh • 2.3k views

ADD COMMENT • link updated 7.1 years ago by Alex Reynolds 36k • written 7.1 years ago by shreyajha ▴ 30

0

Entering edit mode

Hello shreyajha!

It appears that your post has been cross-posted to another site: https://stackoverflow.com/questions/50578840/r-error-in-scan-too-many-items

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY • link 7.1 years ago by zx8754 12k

0

Entering edit mode

Hi As you can see I didn't get any reply in that forum. That's why I posted it here.

ADD REPLY • link 7.1 years ago by shreyajha ▴ 30

1

Entering edit mode

I think that your data is ~ 5 times over the max limit of a typical data object in R. Please consider different ways of reading this data:

ADD REPLY • link 7.1 years ago by Kevin Blighe 89k

score 3 · Accepted Answer · 2018-05-30

Indices for R vectors are 32-bit signed integers, even in 64-bit versions of R, which can put an upper limit on how many elements you can work with.

Looking at the source of getAnywhere(data2haplohh) for your error message, it looks like it tries to make a vector or matrix with 5.3B elements (1054416 rows x 5008 columns). The limit you are perhaps running into is 2^31-1 or 2.1B elements.

Maybe contact the developers for help with adjustments to code to support larger inputs, or look at subsampling or validating your input.