Reconstruct bed file out of columns
2
1
Entering edit mode
6.6 years ago
dzisis1986 ▴ 70

Hello i have a bed file like this

1   3004104 3004110
1   3005819 3005825
1   3008315 3008321 
1   3008893 3008899  
1   3009812 3009818 
1   3012422 3012428  
1   3015794 3015800  
1   3016183 3016189 
1   3024019 3024025 
1   3025279 3025285

and i would like to create another one out of this to in this format :

1   1   3004110
1   3004104 3005825
1   3005819 3008321
1   3008315 3008899
1   3008893 3009818
1   3009812 3012428
1   3012422 3015800
1   3015794 3016189
1   3016183 3024025
1   3024019 3025285

Do you know an easy way to do it in R or in Python ?

Thank you in advance

columns R bed script python • 1.4k views
ADD COMMENT
0
Entering edit mode

what have you tried ?

ADD REPLY
0
Entering edit mode

I tried R i can read each column but i can't create a for loop as i want in order to select each time the 2nd column as start and the next row 3rd column as end position

ADD REPLY
0
Entering edit mode

Is bed file per chromosome?

ADD REPLY
0
Entering edit mode

Yes it is a bed file with chr-start-end

ADD REPLY
0
Entering edit mode

What happened to the last row? 1 3025279 3025285

ADD REPLY
1
Entering edit mode

i was thinking about the same but the last row stays like that .. 1 3024019 3025285 The 3025279 postion is into the rage above so no more line needed

ADD REPLY
2
Entering edit mode
6.6 years ago
zx8754 12k

Using R, lag function from dplyr package:

# example data
mybed <- read.table(text = "
                    1   3004104 3004110
                    1   3005819 3005825
                    1   3008315 3008321 
                    1   3008893 3008899  
                    1   3009812 3009818 
                    1   3012422 3012428  
                    1   3015794 3015800  
                    1   3016183 3016189 
                    1   3024019 3024025 
                    1   3025279 3025285")


library(dplyr)

mybed %>% 
  transmute(
    chr = V1,
    start = lag(V2, default = 1),
    end = V3)
ADD COMMENT
1
Entering edit mode
6.6 years ago

In python :

new_bed = open("new_bed_file.bed", 'a')

new_chrom=""
with open("bed_file.bed", 'r') as bed_f:
    for line in bed_f:
        if new_chrom != line.split("\t")[0]:
            first_line=True
            if first_line:
                new_bed.write('\t'.join([line.split("\t")[0],"1",line.split("\t")[2]]))
                first_line=False
            else:
                new_bed.write('\t'.join([line.split("\t")[0],new_position,line.split("\t")[2]]))
            new_chrom = line.split("\t")[0]
            new_position=line.split("\t")[1]

As said in comment you will lose the last row !

ADD COMMENT

Login before adding your answer.

Traffic: 2099 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6