Modify BED with poliregions
2
0
Entering edit mode
5.7 years ago
Emilio Marmol ▴ 180

I have a somewhat tricky BED file format, which I should convert to a classic BED format so as I can properly use it for further steps:

I have this unconventional BED format:

1   12349   12398   +
1   23523   23578   -
1   23550;23570;23590   23640;23689;23652   +
1   43533   43569   +
1   56021;56078   56099;56155   +

Say that those multiple position rows are representing non-coding fragmented regions.

What I would like to get is a cannonical BED file such as:

1   12349   12398   +
1   23523   23578   -
1   23550   23640   +
1   23570   23689   +
1   23590   23652   +
1   43533   43569   +
1   56021   56099   +
1   56078   56155   +

where the poliregions that were mixed in one row, are put in other rows, while mantaining chromosome number and strand.

I have been struggling with a proper way to do this for a while...

Could anyone help?

Thanks

bash awk BED • 795 views
ADD COMMENT
2
Entering edit mode
5.7 years ago
awk '{N=split($2,a,/[;]/);split($3,b,/[;]/);for(i=1;i<=N;i++) printf("%s\t%s\t%s\t%s\n",$1,a[i],b[i],$4);}' input.bed

1   12349   12398   +
1   23523   23578   -
1   23550   23640   +
1   23570   23689   +
1   23590   23652   +
1   43533   43569   +
1   56021   56099   +
1   56078   56155   +
ADD COMMENT
1
Entering edit mode
5.7 years ago
ATpoint 85k

Good luck. In the last line test.pseudobed is the input file. I called it pseudobed because your input is not in BED format. Column 4 in BED is a name, the strand is in column 6. The code snippet will take care of it, producing a standard BED with 6 columns, leaving the 4th and 5th with a . as spaceholder. If you do not want that, simply remove the ".", "." part from the awk commands.

while read i; do
  if [[ $(echo $i | tr " " "\t" | grep -c ';' /dev/stdin) > 0 ]]; then
    CHR="$(echo $i | tr " " "\t" | cut -f1)"
    STR="$(echo $i | tr " " "\t" | cut -f4)"
    paste \
    <(echo $i | tr " " "\t" | cut -f2 | tr ";" "\n") \
    <(echo $i | tr " " "\t" | cut -f3 | tr ";" "\n") | \
    awk -v chr=$CHR -v str=$STR 'OFS="\t" {print chr, $1, $2, ".", ".", str}'
  else
    echo $i | tr " " "\t" | awk 'OFS="\t" {print $1, $2, $3, ".", ".", $4}'
  fi
  done < test.pseudobed

1   12349   12398   .   .   +
1   23523   23578   .   .   -
1   23550   23640   .   .   +
1   23570   23689   .   .   +
1   23590   23652   .   .   +
1   43533   43569   .   .   +
1   56021   56099   .   .   +
1   56078   56155   .   .   +
ADD COMMENT

Login before adding your answer.

Traffic: 2217 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6