R not reading numbers from file
3
0
Entering edit mode
7.5 years ago
nkinney06 ▴ 140

I am trying to read the following file into R variables:

5060803636482931868     83.3366666      0.0     0.0     0.0
15695800775901642752    0.0     81.0061726043   38.1837661841   0.0
12047011437325700351    0.0     38.1837661841   22.2036033112   7.07106781187
2610937148294873212     0.0     0.0     7.07106781187   30.1330383466

The first column are unique keys and the rest is a 4x4 matrix;

I try reading with the following:

fileContents <- as.matrix(read.table('./distanceMatrix.txt', header=FALSE, sep = "\t",strip.white=TRUE))
nameKey <- fileContents[,1]
distMatrix <- fileContents[,-1]

I get this result:

> nameKey

    [1]  5060803636482931712 15695800775901642752 12047011437325701120  2610937148294873088

> distMatrix
               V2       V3        V4        V5
    [1,] 83.33667  0.00000  0.000000  0.000000
    [2,]  0.00000 81.00617 38.183766  0.000000
    [3,]  0.00000 38.18377 22.203603  7.071068
    [4,]  0.00000  0.00000  7.071068 30.133038

notice how the keys don't match the file. I need to be sure everything gets read in properly and make sure I can write it out properly. What am I doing wrong?

R • 2.1k views
ADD COMMENT
1
Entering edit mode

why not:

fileContents <- as.matrix( read.table( './distanceMatrix.txt', header=FALSE, 
                           sep = "\t", strip.white=TRUE, 
                           row.names = 1 ) )
ADD REPLY
1
Entering edit mode

How is this a bioinformatics question?

ADD REPLY
1
Entering edit mode
7.5 years ago
h.mon 35k

A couple of suggestions:

1) read everything as character and later convert to number

fileContents <- as.matrix(read.table('distanceMatrix.txt', header=FALSE, 
                                     row.names = 1, sep = "\t", strip.white=TRUE, 
                                     colClasses = "character" ) )
nameKey <- rownames(fileContents)
distMatrix <- as.numeric(fileContents )
dim(distMatrix) <- dim(fileContents)

Ill try but in reality my matrix will be rather large and the number of NAs would have to be dynamically assigned

I don't know how the file is being created, but maybe:

2) you can prepend a character to the first element of every row before reading the file into R - this can be accomplished in place with sed, without creating a copy of the file.

3) split the file into one file with row names, and other with the matrix numbers.

ADD COMMENT
2
Entering edit mode
7.5 years ago

To expand a bit on what h.mon correctly wrote, your issue is that you're not treating row names as row names, but rather converting them to numbers. Since they're HUGE numbers, they're presumably getting stored a floats or doubles, which means you're not going to get the exact value back. Of course, you don't need that as a value, just a name, so treat them accordingly (i.e., do what h.mon showed).

ADD COMMENT
0
Entering edit mode
7.5 years ago
nkinney06 ▴ 140

That makes sense, but I appear to have the same problem when I run:

similarityMatrix <- as.matrix( read.table( './testMatrix.txt', header=FALSE, sep = "\t", strip.white=TRUE, row.names = 1 ) )

I get:

> similarityMatrix
                           V2       V3        V4        V5
5060803636482931712  83.33667  0.00000  0.000000  0.000000
15695800775901642752  0.00000 81.00617 38.183766  0.000000
12047011437325701120  0.00000 38.18377 22.203603  7.071068
2610937148294873088   0.00000  0.00000  7.071068 30.133038

and the matrix ( in particular the row names ) should be

5060803636482931868 83.3366666  0.0 0.0 0.0
15695800775901642752    0.0 81.0061726043   38.1837661841   0.0
12047011437325700351    0.0 38.1837661841   22.2036033112   7.07106781187
2610937148294873212 0.0 0.0 7.0710678118    30.1330383466

Is it possible to read the file twice, first as alphanumeric for column one only?

ADD COMMENT
0
Entering edit mode

add colClasses=c("factor", NA, NA, NA, NA) to the options.

ADD REPLY
0
Entering edit mode

Ill try but in reality my matrix will be rather large and the number of NAs would have to be dynamically assigned

ADD REPLY
0
Entering edit mode

It's likely that the readr package will help, it's better at not changing column names by default.

ADD REPLY
0
Entering edit mode

I guess you are running the R code only after creating the file, so you can probably get the number of columns beforehand, so you could do:

colClasses = c("character", rep("numeric",4) )

or

colClasses=c("factor", rep(NA, 4) )

Or you could use scan or readLine to read just one line, get the number of columns, and then use that to set rep(NA, columns)

ADD REPLY

Login before adding your answer.

Traffic: 2023 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6