Hi, I want to use bedops to analyze some of the .bed files from ChIP-seq, but one of the files I can't go through the analysis, even can't sort-bed. It keeps saying the BED row length exceeds capacity at line 1. It doesn't help even after I deleted the first row of my .bed file.
unknownb8f6b1106ced:chip-seq fuxiaoyong$ sort-bed era_inpegf_mcf7_n3_hg18_f20_nr.bed
BED row length exceeds capacity at line 1 in era_inpegf_mcf7_n3_hg18_f20_nr.bed.
Check that you have unix newlines (cat -A) or increase TOKENS_MAX_LENGTH in BEDOPS.Constants.hpp and recompile BEDOPS.
I am very new for using tools to analyze NGS data, even for the Mac OS X system. So, please help me and your detailed explanation will be greatly appreciated!!
Hi, could you maybe give us the output of the following commands:
wc -l era_inpegf_mcf7_n3_hg18_f20_nr.bed
head -n 1 era_inpegf_mcf7_n3_hg18_f20_nr.bed | cat -A
Thank you!
Hi Ram RS, thanks for your reply! Please see the following results when I put into the commands you asked. I am looking forward to solving this mystery!
It kinda looks like it's either an empty file or the new line characters are acting weird. Please check out these two commands for their output:
Now I kinda know the reason, but don't know how to fix it. I opened the
.bed
file before in Excel and deleted several blank rows I found and re-saved it. I noticed that after doing this, the file kind became the "simple text format" from the "Unix Executable File". When I dosort-bed
or other bedops commands on this file, the problem as I posted appeared.So, I did try use the command:
cat xxx.txt > xxx.bed
and it seems still not work. The question is:.bed
file if something wrong (e.g. several blank rows) in this file, without to force opening it in Excel?.bed
unix excutable file?Here, I also post the output when I did the
sort-bed
on my very beginning.bed
file without any editing in Excel. The message of potential blanks line made me opened it before in excel and did find several blank rows.Thanks for all your help!
Hi, the usual advice first: While Excel is really tempting, it is a bad tool for bioinformatics. Most files in Bioinformatics are plain text, meaning any plain text editor can read them. If they're a manage-able size, I'd suggest TextWrangler or BBEdit for Mac, gedit or kedit for Linux and Notepad++ for PC. If you're comfortable with command line, emacs, vim or nano can be used.
A BED file is a tab-delimited file. This means fields in each line of a BED file are separated by tab characters. Excel's usual behavior is to import a tab delimited file such that each field is in its own cell. While this might help peeking into the content, modifying via Excel is best avoided.
In summary, use TextWrangler. It's lightweight, does not botch stuff up and is way more friendly with plain text files than Excel. If you wanna do statistical analysis from these files, I'd suggest using Python (IPython notebook) or R.
Pro-tip: To remove all blank lines from a file (and write to a new file), run this:
Also, let's say you wish to sort a bed file so the first column is in ascending order. You can use UNIX's builtins to do this. Multiple options here: How To Sort Bed Format File
Hope this helps!
Thank you so much, Ram RS!! I solved my problem! I did exactly what you said: Download TextWrangler > Edit .bed file in TextWrangler (i.e. delete the blank rows) >
sort-bed
the.bed
file (successfully!) > Run bedops commands and playing now with my data (answer my biological questions!)This is a great bioinformatics forum/community! I love it!! And will come here often...
Thanks and have a great weekend!!!
Hello @xiaoyonf, I'd really appreciate if you could maybe mark my answer below as an answer to this question. Thank you!
Perhaps this is a line ending problem. Post what Ram RS asked for and I expect you'll see that the first line is actually the whole file (this is an easy fix).