Entering edit mode
7.6 years ago
abascalfederico
★
1.2k
I've found something unexpected, may be I just didn't understand how coordinates work.
I am intersecting a bed file with coordinates:
1 100 101
Against a vcf file, which has two sites in it:
1 100 etc etc
1 101 etc etc
The bed coordinates define a single site (100 in zero-based or 101 in one-based) However, intersectBed says that this bed line intersects with both vcf lines. I thought it should only intersect with 1:101 (vcf is 1-based). Isn't that incorrect? What am I missing?
Thanks!
Hi, Could you tried please modify your bed file: 1 100 100 and then use syntax - bedtools intersect -a in1 -b in2 -wao
Useful post: Cheat Sheet For One-Based Vs Zero-Based Coordinate Systems
Thanks WouterDeCoster. However, that doesn't answer my question. According to what is explained in that post I should get only one intersection between my 0-based bed and my 1-based vcf.
I didn't state it would answer your question, just that it's useful ;-) But for the rest, I'm as puzzled as you by coordinates.
can you also post how you used
intersectBed
(command/script), that helps tracking down whats happening.Ok, I know what's happening... Those sites are indels and bedtools, smartly (not as me), is taking that into account! Thanks for your help.
I run it like this:
Version v2.22.0
Contents of "file.bed":
Contents of "file.vcf":
As a side note also for further readers, remember that indels in a VCF file are stored including the last nucleotide before the indel. The position you see (1-based) is the position of that nucleotide. Sometimes this can be a tricky logical passage to make when this nucleotide lies exactly at the border of a bed region!