Entering edit mode
7.6 years ago
rbronste
▴
420
Hi
bedops --difference giving me some errors in the final bed file and Im wondering why, how to get a workaround?
chr8 9 69893500 5.00E+07 95000416 65834
Getting some intervals that look like this and not sure where its going off the rails when trying to subtract a genomic space from another interval file? Thanks!
Rob
Are your inputs sorted (i.e., with
sort-bed
)? Do you have files with Windows line endings (e.g., from Excel spreadsheets)? If you want to post your files somewhere, I'm happy to take a quick look.I tried to sort immediately following bedops --difference with a file of CTCF coordinates in mm10 however get the following error message right after when I try to sort:
Error on line 67243 consensus_bedops_CTCFdiff.bed. Genomic end coordinate is less than (or equal to) start coordinate.
You should sort your input files (regions of interest and CTCF regions) with
sort-bed
before you do any set operations withbedops
. You only need to sort once, but you need to make sure the inputs are sorted before you work on them.Bedops sorting has been weird for me in general, often get this sort of error message:
BED row length exceeds capacity at line 1 in consensusPeaks_DiffBind_mm10.bed. Check that you have unix newlines (cat -A) or increase TOKENS_MAX_LENGTH in BEDOPS.Constants.hpp and recompile BEDOPS.
Are you working with Windows or Microsoft Office-sourced files? This can cause problems with Unix-based bioinformatics tools of all kinds, including BEDOPS. Here's one way to strip Windows newlines and sort the BED file in one pass:
Very infrequently I open something in excel but primarily work in through command line on a computing cluster. Though yes I may have stupidly polluted some of these. Thanks for the tip!