Hello everyone, Please, I am new with bioinfo tools and I ask your help. I did GBS on two plant varieties on which I work. I have the vcf files and I want to identify SNPs that are specific to each of my varieties. Could someone point me to a program or procedure to follow? thank you so much
Thank you all. I will try and keep you informed!
Hello everyone, Thanks for your advice but I still have trouble sorry. Could someone help please? I tried different commands that you advised me "vcf contrast" of vcftools and "subtract" from bedtools but it does not work. Maybe I poorly explained what I want to do. I'll try to explain better. In fact I have the GBS results of my two "varieties" of plants in a single vcf file containing the SNP position on the contigs (the reference sequence is partial), genotypes and other sequencing information for all the samples. What I want is to know if some SNPs are specific to either of my two "varieties". If so, I want to identify these SNPs and extract them from the vcf file. Thanks a lot for the help.
If you are able to program in Python there's a package called
PyVCF
, which can parse the VCF for you and giving you easy access to the genotypes for each sample. Then it would be a matter of simply filtering out those positions that are the same across all samples while keeping those that are different, and doing whatever downstream analysis you want on those.