I am unsure why my files are not being accepted in bedtools jaccard, any tips?
2
0
Entering edit mode
7 months ago
Ronin ▴ 10

I am wanting to run a simple bedtools jaccard command:

bedtools jaccard -a a.bed -b b.bed

Here is a head command of both bed files:

head a.bed 
chr start   end
CM020918.1  39021475    39021576
CM020915.1  25208073    25208174
CM020915.1  21129216    21129317
VHII01000032.1  201725  201826
CM020927.1  15130715    15130816
CM020922.1  27493797    27493898
CM020922.1  1904773 1904874
CM020912.1  38993651    38993752
CM020915.1  20895193    20895294


head b.bed
chr start   end
CM020917.1  34469864    34469965
CM020927.1  21249285    21249386
CM020914.1  6637448 6637549
CM020914.1  9005599 9005700
CM020926.1  1419014 1419115
CM020914.1  16369051    16369152
CM020914.1  16531627    16531728
CM020924.1  22596008    22596109
CM020924.1  19754163    19754264

Yet when I try to run the command, bedtools hisses this back at me:

bedtools jaccard -a a.bed -b b.bed
Error: unable to open file or unable to determine types for file a.bed

- Please ensure that your file is TAB delimited (e.g., cat -t FILE).
- Also ensure that your file has integer chromosome coordinates in the 
  expected columns (e.g., cols 2 and 3 for BED).

So, I feel a bit stuck. I don't know what cat -t FILE is supposed to mean. I am sure these are tab separated (I imported the files into Excel, and exported as a tab delimited text (.txt). Any insights would be appreciated!

bedtools • 565 views
ADD COMMENT
1
Entering edit mode
7 months ago
LChart 4.5k

So, I feel a bit stuck. I don't know what cat -t FILE is supposed to mean. I am sure these are tab separated (I imported the files into Excel, and exported as a tab delimited text (.txt). Any insights would be appreciated!

This is almost certainly your issue. Use the following command to fix them:

cat a.bed | sed '1s/^\xEF\xBB\xBF//' | sed 's/\r//g' | awk '{print $1"\t"$2"\t"$3"\t.\t.\t."}' > a.fixed.bed

You are: (i) Concatenating the file to standard out; (ii) Removing any byte-order marks; (iii) Removing carriage returns; (iv) Converting white spaces to true tabs, and adding placeholders for name, score, and strand fields (v) writing the result to "a.fixed.bed"

ADD COMMENT
0
Entering edit mode

Thank you, this worked :)

ADD REPLY
0
Entering edit mode
7 months ago

chr start end is just not a valid header for a bed file

ADD COMMENT
1
Entering edit mode

chr start end is however accepted as a header line by bedtools.

The problem is almost certainly a byte-order mark (BOM) emitted by Excel, which would be shown by file a.bed and can also be removed in vim by using :set nobomb and re-saving the file.

ADD REPLY

Login before adding your answer.

Traffic: 2003 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6