R read.table() saying line does not have correct number of elements - when I check, it definitely does
3
1
Entering edit mode
6.1 years ago

I'm trying to read a csv table into R, but I get the following error "line 195 did not have 31 elements" however, when I check , both in Python and through copying the line into R as a string, the line (and the surrounding lines) definitely all have 31 elements.

Is anyone able to hazard a guess as to why R is flagging an incorrect number of lines?

Below is the line that is causing it to crash - supposed to have 31 elements, comma delimited

135411,single nucleotide variant,NM_001129727.2(PLEKHG4):c.1574A>G (p.Asp525Gly),25894,PLEKHG4,HGNC:24501,Likely benign,0,-,8044843,-,RCV000117986,MedGen:CN169374,not specified,germline,germline,GRCh37,NC_000016.9,16,67318242,67318242,A,G,16q22.1,no assertion criteria provided,1,,N,UniProtKB (protein):Q58EX7#VAR_050510,2,129965

The code I am using is:

t2 <- read.table("/Users/NAME/Desktop/variant_summary.csv", sep = ",")

Download link for first 200 lines of the csv file: https://drive.google.com/file/d/1_iUeGkyQwPvm3K2B5kfakhneuvpXpPYg/view?usp=sharing

r software error • 9.5k views
ADD COMMENT
0
Entering edit mode

Can you provide the table up to that line for download?

ADD REPLY
0
Entering edit mode

Yes, thanks for replying. I've amended my original post with a link to the download for the first 200 lines.

ADD REPLY
1
Entering edit mode
6.1 years ago
h.mon 35k

Use read.csv():

t3 <- read.csv( "first200.csv" )
ADD COMMENT
0
Entering edit mode

Wow, that worked perfectly first try, thanks very much. Any idea why read.table() wasn't working? Normally specifying sep = "," works fine

ADD REPLY
2
Entering edit mode

It seems that the hash on column 29 was causing read.table to stop reading the line. Could that be the case?

ADD REPLY
3
Entering edit mode

Yes, this is exactly the case, the default for read.table() is comment.char = "#".

ADD REPLY
1
Entering edit mode
6.1 years ago
ATpoint 85k

Another option using the data.table package, especially helpful when the file is large (hundreds of Mb or even Gb):

fread("your.file", sep=",", data.table=F)
ADD COMMENT
0
Entering edit mode
6.1 years ago
Chirag Parsania ★ 2.0k

Tidy way

library(tidyverse)
dd  <- read_delim("~/Downloads/first200.csv" , delim = ",")

dd
> dd
# A tibble: 199 x 31
   `#AlleleID` Type  Name  GeneID GeneSymbol HGNC_ID ClinicalSignifi~ ClinSigSimple LastEvaluated `RS# (dbSNP)` `nsv/esv (dbVar~ RCVaccession PhenotypeIDS
         <int> <chr> <chr>  <int> <chr>      <chr>   <chr>                    <int> <chr>                 <int> <chr>            <chr>        <chr>       
 1       15041 indel NM_0~   9907 AP5Z1      HGNC:2~ Pathogenic                   1 29-Jun-10         397704705 -                RCV000000012 MedGen:C315~
 2       15041 indel NM_0~   9907 AP5Z1      HGNC:2~ Pathogenic                   1 29-Jun-10         397704705 -                RCV000000012 MedGen:C315~
 3       15042 dele~ NM_0~   9907 AP5Z1      HGNC:2~ Pathogenic                   1 29-Jun-10         397704709 -                RCV000000013 MedGen:C315~
 4       15042 dele~ NM_0~   9907 AP5Z1      HGNC:2~ Pathogenic                   1 29-Jun-10         397704709 -                RCV000000013 MedGen:C315~
 5       15043 sing~ NM_0~   9640 ZNF592     HGNC:2~ Uncertain signi~             0 29-Jun-15         150829393 -                RCV000000014 MedGen:CN03~
 6       15043 sing~ NM_0~   9640 ZNF592     HGNC:2~ Uncertain signi~             0 29-Jun-15         150829393 -                RCV000000014 MedGen:CN03~
 7       15044 sing~ NM_0~  55572 FOXRED1    HGNC:2~ Pathogenic                   1 7-Dec-17          267606829 -                RCV00000001~ MedGen:C183~
 8       15044 sing~ NM_0~  55572 FOXRED1    HGNC:2~ Pathogenic                   1 7-Dec-17          267606829 -                RCV00000001~ MedGen:C183~
 9       15045 sing~ NM_0~  55572 FOXRED1    HGNC:2~ Pathogenic                   1 1-Oct-10          267606830 -                RCV000000016 MedGen:C183~
10       15045 sing~ NM_0~  55572 FOXRED1    HGNC:2~ Pathogenic                   1 1-Oct-10          267606830 -                RCV000000016 MedGen:C183~
# ... with 189 more rows, and 18 more variables: PhenotypeList <chr>, Origin <chr>, OriginSimple <chr>, Assembly <chr>, ChromosomeAccession <chr>,
#   Chromosome <chr>, Start <int>, Stop <int>, ReferenceAllele <chr>, AlternateAllele <chr>, Cytogenetic <chr>, ReviewStatus <chr>, NumberSubmitters <int>,
#   Guidelines <chr>, TestedInGTR <chr>, OtherIDs <chr>, SubmitterCategories <int>, VariationID <int>
ADD COMMENT

Login before adding your answer.

Traffic: 1618 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6