Looking for clarification of .vcf output from variant calling
1
0
Entering edit mode
5 months ago
Jess ▴ 40

Hi,

I am using Freebayes for some variant calling analysis, ultimately to detect drug resistance mutations. I have scoured the internet and perhaps I am not phrasing things in the correct way to get an answer but there is one thing I still don't understand about the output.

If I have the following output as an example:

  • Position 4 - ref = A - alt = G
  • Position 6 - ref = G - alt = C

And lets say that the WT codon covering these regions (4-6) is ATG.

Can I assume that the variant at position 4 will have WT for some bases each side and the variant at position 6 will have WT at each side? In other words, can I assume I have a mix of GTG, ATC (and possibly WT depending on frequencies) but NOT GTC?

From my BAM I can see for some examples this is the case but I want an absolute guarantee that I won't misinterpret these results if I assume at least 2 nucleotides upstream and downstream from a single nucleotide variant are WT.

I am also confused as to why sometimes by variants are displayed as longer sequences and sometimes they are split into single nucleotides like the example above if anyone knows why this is as well.

I hope this makes sense and thank you in advance to anyone who responds!

freebayes vcf variant-calling • 726 views
ADD COMMENT
0
Entering edit mode

Are we talking about a diploid or haploid genome?

ADD REPLY
0
Entering edit mode

I suppose haploid? It mix of viral quasispecies.

ADD REPLY
0
Entering edit mode
5 months ago

Freebayes leverages read-level phasing to infer haplotypes and multi-nucleotide variants (MNVs)

I think if you use those terms in the docs and on this site you'll be able to determine if it is behaving the way you expect.

freebayes haplotype calling (phased variants)

Reason for Freebayes Calling Multi-base Variants

ADD COMMENT
0
Entering edit mode

Thank you for your response! I can find questions relating to why you might have a multi-base variant but what I still don't understand is what makes the decision that outputs 2 single variants vs 1 multi-base variant. It seems random from looking at my .vcf files which obviously it won't be.

ADD REPLY
0
Entering edit mode

because they came from the same read

ADD REPLY
0
Entering edit mode

What do you mean by this? The 2 adjacent SNPs and the MNP examples both have many reads where those positions are within the same read.

ADD REPLY
1
Entering edit mode

I would expect if two snps in a diploid organism are consistently inherited together it would be classified as an MNP, but perhaps in your viral case it appears there is no consistent pattern. The phasing will usually judge them as independent events.

ADD REPLY

Login before adding your answer.

Traffic: 2028 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6