Running two bedtools commands in series produces error "has non positional records"
2
0
Entering edit mode
6.4 years ago
Lina F ▴ 200

Hello all,

I have three samples, generated Illumina PE reads for each of them, and then mapped the reads against the reference to produce three alignments. I called SNPs and now I would like to determine the subset of SNPs that are present in sample A and B but not in C. I ran bedtools as follows:

bedtools intersect -a A.vcf -b B.vcf > A_and_B.vcf
bedtools subtract -a A_and_B.vcf -b C.vcf

Unfortunately, I get the following error message:

ERROR: file A_and_B.vcf has non positional records, which are only valid for the groupBy tool.

I checked that all my files are tab-delimited and I also ran sort -k1 on the A_and_B.vcf file but that didn't affect the output.

My questions are:

  1. How do I string these two bedtools commands together?
  2. Is there a better way to do this?

Thanks for any suggestions!

bedtools vcf • 3.0k views
ADD COMMENT
0
Entering edit mode

you can try something like this.

 bedtools intersect -a A.vcf -b B.vcf | bedtools subtract -a stdin -b C.vcf
ADD REPLY
0
Entering edit mode

Thank you for the suggestion! Unfortunately, this didn't work for me either :-(

ADD REPLY
2
Entering edit mode
6.4 years ago

There is an -header option for bedtools intersect which include the header in the output file.

-header Print the header from the A file prior to results.

fin swimmer

ADD COMMENT
0
Entering edit mode

Thanks for the tip, this worked and let's me avoid copying things manually!

ADD REPLY
0
Entering edit mode

You should move that to answer, because that is exactly the point.

ADD REPLY
0
Entering edit mode
6.4 years ago
Lina F ▴ 200

I found a potential workaround. First, I upgraded bedtools from v2.26.0 to v2.27.1. This changed the error message that was reported and made it easier to interpret:

Error: unable to open file or unable to determine types for file A_and_B.vcf

- Please ensure that your file is TAB delimited (e.g., cat -t FILE).
- Also ensure that your file has integer chromosome coordinates in the
  expected columns (e.g., cols 2 and 3 for BED).

I took another look at the output file and I noticed while it is tab-delimited, it is missing the header (which specifies that the input is in VCF format). I manually copied the header of one of the input files to the intermediate output file. Now my second command, bedtools subtract, works.

This seems like a way forward, with the caveat being that the header I manually copied to the intermediate file is not entirely correct.

If there is a better way to do this I'd love to hear about it!

ADD COMMENT

Login before adding your answer.

Traffic: 1649 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6