What does it mean to say "b-allele frequency"?
1
1
Entering edit mode
8.2 years ago
novice ★ 1.1k

I'm trying to work with Canvas to find Copy Number Variants in human data. I would appreciate if someone clarified what this input is supposed to be:

   --b-allele-vcf=VALUE   vcf containing SNV b-allele sites (only sites 
                               with PASS in the filter column will be used) 
                               (required)

I have called and filtered SNPs for my samples. Is this asking me to provide the set of SNPs (or SNP sites) that are flagged as having the alternate allele in the VCF file? If so, couldn't I just grep AB=1 and be good?

snp canvas • 8.1k views
ADD COMMENT
1
Entering edit mode

See this manual for details:

http://biorxiv.org/content/biorxiv/suppl/2016/01/13/036194.DC2/036194-2.pdf

"Canvas supports a number of different workflows depending on the input sequencing data. The available modes are: 

Germline

WGS: CNV calling of a diploid germline sample from whole genome sequen cing data 

Somatic

Enrichment: CNV calling of a somatic sample from targeted sequencing data 

Somatic

WGS: CNV calling of a somatic sample from whole genome sequencing data 

Tumor

normal

enrichment: CNV calling of a tumor/normal pair from targeted sequencing data"

ADD REPLY
0
Entering edit mode

Thank you, Natasha! Why didn't they just say "heterozygous sites" from the beginning?? I guess what I have to do then, for the purpose of this input file, is to grep for 0/1 or 1/0 SNPs.

ADD REPLY
0
Entering edit mode
7.9 years ago
jpflorido • 0

Hi! Did you find out the meaning of --b-allele-vcf? Is it related to the sample or to the normal/control? Thanks!

ADD COMMENT
0
Entering edit mode

In this context, the b allele is the non-reference allele observed in a germline heterozygous SNP, i.e. in the normal/control sample. Since the tumor cells' DNA originally derived from normal cells' DNA, most of these SNPs will also be present in the tumor sample. But due to allele-specific copy number alterations, loss of heterozygosity or allelic imbalance, the allelic frequency of these SNPs may be different in the tumor, and that's evidence that one (or both) of the germline copies was gained or lost during tumor evolution.

So, filter for heterozygous genotypes in the normal sample, but keep the tumor sample in the VCF.

ADD REPLY
0
Entering edit mode

Hi Eric, I am interested in calculating the BAF (B-Allele Frequency) of tumor samples which do not have any matched normal sequenced. As you said, that "B-allele is the non-ref allele observed in a germline heterozygous SNP". I want to know how can I find out those germline heterozygous sites in my VCF and then calculate their BAF. The VCF was generated using Unified Genotyper from GATK. I have attached few records from VCF file. I am quite new to this. I will appreciate your response in this regard.

CHROM   POS     ID  REF ALT SCP2                        SCP3                       SCP43
>chr1   14522   .   G   A   0/1:107,12:119:15:15,0,439  0/1:101,12:114:99:111,0,712 0/1:76,9:86:28:28,0,365
> chr1  14542   .   A   G   0/1:115,11:126:16:16,0,535  0/1:110,13:123:94:94,0,722  0/1:71,11:82:37:37,0,302
> chr1  14574   .   A   G   0/0:122,8:130:46:0,46,888   0/1:93,10:103:57:57,0,731   0/1:72,12:84:30:30,0,521
> chr1  14653   rs375086259 C   T   0/1:131,30:162:99:372,0,1365    0/1:100,25:125:99:378,0,1436    0/1:81,23:104:99:227,0,1238
> chr1  14976   rs71252251  G   A   0/1:204,44:250:99:218,0,3516    0/1:223,27:250:5:5,0,3978   0/1:170,45:217:99:459,0,2776
> chr1  15688   .   C   T   0/1:35,16:52:66:66,0,166    0/1:12,8:20:25:59,0,25  0/1:18,9:27:3:3,0,189
ADD REPLY
0
Entering edit mode

Some details here in the CNVkit documentation: - https://cnvkit.readthedocs.io/en/stable/baf.html - https://cnvkit.readthedocs.io/en/stable/fileformats.html#vcf

If you don't have a matched normal to help distinguish germline variants from somatic, you can use dbSNP or 1000 Genomes to identify which of your variants are common SNPs. Then filter your VCF to retain only those, and use the output for BAF calculations in Canvas or CNVkit.

ADD REPLY

Login before adding your answer.

Traffic: 2080 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6