How to infer segment direction from vcf
1
1
Entering edit mode
3.3 years ago
octpus616 ▴ 120

Dear,

I recently apply my attentions on analysis of structure variants (SV) with called vcf file. I have read the The VCF specification V 4.2 from https://github.com/samtools/hts-specs. I noted the SVTYPE called BND is useful for my work. But I have some confused question on its ALT field.

Fistly, the specification descirbed the field as follow:

enter image description here

enter image description here

This is easily to understand about the breakpoint positions and its mate breakpoint. But I am not sure how to find the correct direction of the fragment? for clearly my question, please allow me show another exmaple, the following picture shows a INV in chr2: 321682 - 421682.

enter image description here

I can understand the 321681 is linked 421681 like the upward picture shows.

But why we cant let 321682 link to 421682 like following, is only because position 321681 has been discovered? enter image description here

One more question, are there any package has been developed to help us process the ALT feild of SVTYPE is BND?

Thanks for your reading.

Best.

Zhang.

SV vcf NGS • 2.1k views
ADD COMMENT
1
Entering edit mode
3.3 years ago
cmdcolin ★ 4.0k

Not sure if I understand the question but 321682 is linked to 421682 in the breakends. You can kind of imagine the breakends having little "feet" which I sometimes call "directional feet" not sure if that helps

enter image description here

We have a pseudocode in javascript that helps parse breakend strings

for example parseBreakend('G]2:421681]') results in

{
    "MateDirection": "left",
    "Replacement": "G",
    "MatePosition": "2:421681",
    "Join": "right"
}

this says, the MateDirection is left because the square bracket points to the left and the "join" is right because it is after the letter G

another example parseBreakend('[2:421682[T') results in this

{
    "MateDirection": "right",
    "MatePosition": "2:421682",
    "Join": "left",
    "Replacement": "T"
}

The code for doing this looks like this (from our @gmod/vcf-js library)

  function parseBreakend(breakendString) {
    const tokens = breakendString.split(/[[\]]/)
    if (tokens.length > 1) {
      const parsed = {}
      parsed.MateDirection = breakendString.includes('[') ? 'right' : 'left'
      for (let i = 0; i < tokens.length; i += 1) {
        const tok = tokens[i]
        if (tok) {
          if (tok.includes(':')) {
            // this is the remote location
            parsed.MatePosition = tok
            parsed.Join = parsed.Replacement ? 'right' : 'left'
          } else {
            // this is the local alteration
            parsed.Replacement = tok
          }
        }
      }
      return parsed
    }
    // if there is not more than one token, there are no [ or ] characters,
    // so just return it unmodified
    return breakendString
  }
ADD COMMENT
0
Entering edit mode

Hi, cmdcolin. Thanks for your reliable answer. I think your understand is correct (sorry for may poor English). I am reading your github repository @gmod/vcf-js. It seems that Its fits the problem I encountered, but I need more time to understand because not know javascript. Fortunately, it looks very similar to python + C. I will further ask more questions if needed.

Thanks for your kindly help again.

Best.

Zhang.

ADD REPLY
0
Entering edit mode

Hi, I have shown my derivation for one of the problems below. Could you please help me evaluate whether I understand the solution accurately?

enter image description here

ADD REPLY
0
Entering edit mode

I'm not sure I understand fully, but the directionality using arrows may be somewhat misleading. For example the "arrow" that you have pointing to the right in your diagram is not actually the way the inverted sequence would appear in a patients genome. That DNA segment would be read in the right-to-left direction for the inversion. However, (rant ahead) the breakend spec is VERY DIFFICULT to decode for this reason. It basically turns your linear genome into a graph genome, and splits 1 conceptual event (this inversion for example) into 4 records. It is very difficult to properly interpret the breakend spec in my opinion. When you want to determine "what is the actual result on the patients genome for a set of breakend events" it is just a challenge. I modified your picture to show what I think the arrows should point. You can see I also have my "directional feet" but the "directional feet" do not imply the "directional arrows" necessarily, they just show side of the breakpoint that the breakend connects to enter image description here

ADD REPLY
1
Entering edit mode

Thinks, cmdcolin.

Your understanding of the problem is accurate,What surprised me was that you intuitively konwn that I was trying to do some difficult things (to determine "what is the actual result on the patients genome for a set of breakend events"). I have tried to turns linear genome into a graph genome (I don't know if my understanding of graph genome is correct? I used graph structure in python to store genome pos and their link, then a graph search was appied to determine the events which I am interested) with the link of breakpoint showed in vcf. But when I tried to take the direction into consideration, things got tricky. As you said, the breakend spec is VERY DIFFICULT to decode. Fortunately, your experienced post answered some doubts. Thanks for you help.

ADD REPLY
0
Entering edit mode

happy to help. breakends are complex, but indeed, the biology is very complex too, so we can't necessarily expect VCF itself to be simple. however, it seems to me to imply that we need a meta-VCF format, something which can try to be interpreted easier. we also try to make tools with jbrowse for visual comprehension of breakends. here is an example screenshot showing breakend (green) with read support (curly lines) https://jbrowse.org/jb2/assets/images/breakpoint_split_view-fcc0006767af5061bedd51b05f95634f.png

if interested i can help with jbrowse for your data too :) (shameless plug)

ADD REPLY
0
Entering edit mode

Hi, cmdcolin. I am very happy to see the author of jbrowse genome browser here. In fact, I saw jbrower's review in a review article more than a year ago and I am satisfied with its beautiful UI. It really have some powerful features. But then I seem to have encountered some problems when it runs under windows linux subsystem (WSL). I think it may beacuse of my complicated environment. I really want to try it again, it looks very suitable for my current data, if I have any questions, I hope I can contact you again.

Best.

Zhang.

ADD REPLY

Login before adding your answer.

Traffic: 1614 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6