bedtools intersect with "bedgraph" style with multiple extra columns of data for multiple samples?
2
0
Entering edit mode
6.3 years ago

Hi,

I can successfully perform bedtools intersect for two 3-column BED files, but I want to perform the same intersect using a "file A" that contains multiple additional columns with values for a large number of samples formatted like so:

File A:

(chr) (start) (stop) (sample 1..2..3...4...etc..)

chr1   1001   1002    10    25   14   25
chr1   1006   1007    12    22   11   50
chr1   1012   1013    44    11   12   30

File B:

chr1   500    1010
chr1   2000   3000

Output:

chr1   1001   1002    10    25   14   25
chr1   1006   1007    12    22   11   50

Does anyone know if there is a way to use bedtools intersect, or bedtools intersect with another program, or any other code at all into perform this kind of intersect and retain the data in the additional columns in the output?

Thanks!

Dan

bedtools intersect awk • 4.0k views
ADD COMMENT
0
Entering edit mode

Have a look at bedtools unionbedg.

ADD REPLY
0
Entering edit mode

Actually, the File A is an output of unionbedg. So it is a bedgraph with many columns and then I want to intersect it with a BED file with intervals. (Also File A has data for each single nucleotide whereas the BED file has intervals)

I added an example to the original post.

ADD REPLY
1
Entering edit mode
6.3 years ago

Not sure if/how your files are sorted, but this should take care of that:

$ bedops -e 1 <(sort-bed fileA.bed) <(sort-bed fileB.bed) > answer.bed
ADD COMMENT
0
Entering edit mode

This does not work for me for some reason. It says "Non-numeric end coordinate. See line 1 in fileB.bed". This is not even the problematic file and it worked in bedtools intersect.

ADD REPLY
0
Entering edit mode

If you run cat -te on a few lines of your files, what does it say? For instance what comes out of: head -5 foo.bed | cat -te? Something's up with your files, which needs fixing.

ADD REPLY
0
Entering edit mode

chr1^I2572970^I2579715^M$

ADD REPLY
0
Entering edit mode

See that ^M at the end? That's a Windows carriage return character. You need to remove that. You can use tr for this:

$ tr -d '\r' < foo.bed > foo.fixed.bed

Repeat for all afflicted files, then run your commands on the fixed files.

ADD REPLY
0
Entering edit mode

OK so this seems to be a Windows line ending, so I ran dos2unix. Then I ran bedops as before, and now (unlike bedtools intersect) it ran, but it does not retain the extra columns.

ADD REPLY
0
Entering edit mode

What is the output of this:

$ head -5 fileA.fixed.bed | cat -te

What is the output of this:

$ head -5 fileB.fixed.bed | cat -te

Please post everything you see. The bedops -e command just reports back any elements as they are found, and does not modify them. So either your files are not structured as described, or there is some other problem. If we can see your actual inputs, we can probably figure out what's up.

ADD REPLY
0
Entering edit mode

Thanks to your suggestions I found the problem! File A was space-delimited, not tab. It now runs and retains all the columns. Thank you very much for your help.

ADD REPLY
1
Entering edit mode

Awesome! Glad to help.

ADD REPLY

Login before adding your answer.

Traffic: 2020 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6