Obtain information from a filtered VCF file
0
0
Entering edit mode
21 months ago
Dardo • 0

Hello, my problem is that I urgently need to select 30 chickpea lines that maximize diversity for genes related to symbiosis with rhizobia. I have 531 lines and a filtered VCF file containing the variants that appear for each position of each chromosome in relation to the reference genome. My question is: how can I obtain 30 lines that have variability in those genes, knowing their positions and chromosomes, from the VCF file containing the variants for the 531 lines?

vcf variants • 610 views
ADD COMMENT
1
Entering edit mode

First, annotate your VCF file using VariantEffectPredictor or SnpEff with the gtf for the chickpea reference, which will place gene annotations and mutation consequences on each of the variants. Second, subset the VCF to mutations in your genes of interest (rhizobia symbiosis genes, which I assume you can pull from literature) that modify the protein sequences. Finally, you can convert the genotypes into reference and non-reference counts (0, 1, 2 assuming diploid); normalize the counts by (x-u)/sqrt(2*u*(1-u)) where u is the allele frequency of the variant, and use PCA to visualize the variation between strains.

ADD REPLY

Login before adding your answer.

Traffic: 2440 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6