BEDPE format explanation.
0
1
Entering edit mode
2.6 years ago

I'm working with BEDPEs from 10x Genomic reads and I can't wrap my head around the meaning of the BEDPE format columns. Specifically start1/end1 and start2/end2. Generally structural variants that are discovered will have start and end coordinates that are only 1 bp different. That seems to make sense intuitively. But some will have a start2/end2 separated by several dozen or a hundred bp, which I don't really understand. If an inversion for example takes place then the "feature" coordinates described in these columns would be the breakpoint locations, but shouldn't the breakpoints only occur in 2 places compared to the reference genome? Why are there 4 values for breakpoint locations?

structural variants • 1.4k views
ADD COMMENT
1
Entering edit mode

I cannot say for certain with your BEDPE data if this is similar, but it reminded me of how VCF represents structural variants using the "breakend specification" which also has 4 points on the reference genome for an inversion https://samtools.github.io/hts-specs/VCFv4.3.pdf

ADD REPLY
1
Entering edit mode

See section 5.4.7

ADD REPLY
0
Entering edit mode

Yes, thank you. This was my assumption was that the BEDPE coordinates represented a region of "likelihood" for a breakpoint or something similar. I saw one answer from a thread years ago that stated it was a confidence interval, but just couldn't find any documentation to confirm that. I guess I'll just continue to operate on that assumption.

ADD REPLY

Login before adding your answer.

Traffic: 1916 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6