Here's a one-liner that should work:
$ bedops --complement <( sort-bed A.bed ) <( awk -v OFS="\t" '{ print $1, "0", $2 }' my.genome | sort-bed - ) > answer.bed
This part is called a process substitution in the bash shell:
... <( awk -v OFS="\t" '{ print $1, "0", $2 }' my.genome | sort-bed - ) ...
It uses awk
to turn the file my.genome
into a sorted BED file, on which you can do set operations with bedops
. Basically, everything within <( ... )
returns operational intervals that are fed to the bedops
process as a standard input stream.
Here's what the one-liner looks like when broken down into separate commands:
$ sort-bed A.bed > A.sorted.bed
$ awk -v OFS="\t" '{ print $1, "0", $2 }' my.genome | sort-bed - > my.genome.sorted.bed
$ bedops --complement A.sorted.bed my.genome.sorted.bed > answer.bed
$ rm A.sorted.bed my.genome.sorted.bed
Process substitutions might look a little odd, at first, but they help avoid creating intermediate files, which slow down operations on whole-genome scale work. Intermediate files also require disk space and need cleaning up. It's useful to avoid intermediate files, when possible.
I changed it into tab-delimited and still does not work.
If your files were tab-delimited, it would work. You probably substituted the wrong delimiter in your
sed
command. Probably it is a double-whitespace or something, and after your command you now have a hybrid tab-whitespace delimiter.Could you please show your command lines who you generated bed files? I am still having problem.
In this case I simply did it manually by tiping it in a text editor. What organism are you working on? There are genome.sizes files available for download for most species.
Try replacing all
[[:space:]]+
with\t
. That should work.