Question

separating dataset by a specific character

0

Entering edit mode

4.2 years ago

Meghan.T • 0

I downloaded a .bed file and converted it using this https://genome.ucsc.edu/cgi-bin/hgLiftOver now I want to load the converted file "datahg38.bed" in R using this code

   library(rtracklayer)
dataset<-import.bed("datahg38.bed")

and I get this error

error : $ operator is invalid for atomic vectors

any ideas how to fix this ?

so I found out that the output format is a list of 300,000 by 1 which should be a table of 300,000 of 3 variables

the format is something like this

chr1:10142-10351
chr1:10453-10563
chr1:13044-13104
.
.
.

so basically I need to read it as a table with read.table and then convert it to a matrix of 300,000 of 3 . and character : , - should be used to split data. I'm looking for an output like this

 chr1  10142  10351
 chr1  10453  10563
 chr1  13044  13104
 ...

so can anyone please suggest an efficient way to separate this data?

R • 757 views

ADD COMMENT • link updated 4.2 years ago by JC 13k • written 4.2 years ago by Meghan.T • 0

0

Entering edit mode

Why not convert the bed file to a more suitable format using linux? Something like

cat datahg38.bed | tr "\:" "\t" | tr "-" "\t" > datahg38_format.bed

ADD REPLY • link 4.2 years ago by Aspire ▴ 370

score 0 · Answer 1 · 2020-09-23

0

Entering edit mode

4.2 years ago

JC 13k

Basically your BED file is not a BED file, so you need to convert it:

perl -pe 's/:/\t/; s/-/\t/' < datahg38.bed > datahg38_realBED.bed

ADD COMMENT • link 4.2 years ago by JC 13k