Obtain information from a filtered VCF file

0

Entering edit mode

21 months ago

Dardo • 0

Hello, my problem is that I urgently need to select 30 chickpea lines that maximize diversity for genes related to symbiosis with rhizobia. I have 531 lines and a filtered VCF file containing the variants that appear for each position of each chromosome in relation to the reference genome. My question is: how can I obtain 30 lines that have variability in those genes, knowing their positions and chromosomes, from the VCF file containing the variants for the 531 lines?

vcf variants • 610 views

ADD COMMENT • link updated 21 months ago by LChart 4.6k • written 21 months ago by Dardo • 0

1

Entering edit mode

First, annotate your VCF file using VariantEffectPredictor or SnpEff with the gtf for the chickpea reference, which will place gene annotations and mutation consequences on each of the variants. Second, subset the VCF to mutations in your genes of interest (rhizobia symbiosis genes, which I assume you can pull from literature) that modify the protein sequences. Finally, you can convert the genotypes into reference and non-reference counts (0, 1, 2 assuming diploid); normalize the counts by (x-u)/sqrt(2*u*(1-u)) where u is the allele frequency of the variant, and use PCA to visualize the variation between strains.

ADD REPLY • link 21 months ago by LChart 4.6k

Login before adding your answer.