Does anyone Help me out: How one could identify the monomorphic snps position from a vcf file? I used Parsnp alignment tool to generate VCF file ? Thanks advance.
Does anyone Help me out: How one could identify the monomorphic snps position from a vcf file? I used Parsnp alignment tool to generate VCF file ? Thanks advance.
if you have a single multisample vcf file, then you can use bcftools
to create a vcf file with 1-allele-only variants:
bcftools view -C1 file.vcf
you can also loop through all samples' genotypes and check if they're all the same. you could then print, for instance, the list of chromosome positions of those monomorphic variants:
grep -v ^# file.vcf \
| perl -lane '%gts = ();
foreach (@F[9..$#F]) { s/:.+//; $gts{$_} = 1 }
print join "\t", @F[0,1]" if scalar(keys %gts) == 1'
I tried with your 1st suggestion but got some error about my vcf file format : Do you have any idea what could be the problem?
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
##FORMAT=<ID=DV,Number=1,Type=Integer,Description="# high-quality non-reference bases">
##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
[bcf_sync] incorrect number of fields (0 != 5) at 0:0
Here is the header of my vcf file
##FILTER=<ID=IND,Description="Column contains indel">
##FILTER=<ID=N,Description="Column contains N">
##FILTER=<ID=LCB,Description="LCB smaller than 200bp">
##FILTER=<ID=CID,Description="SNP in aligned 100bp window with < 50% column % ID">
##FILTER=<ID=ALN,Description="SNP in aligned 100b window with > 20 indels">
##FILTER=<ID=REC,Description="PhiPack">
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
what exactly do you mean with "monomorphic snps" positions? are you working with a multisample vcf file, and in that case do you want to look for positions that have the same alleles in all samples? or are you working with single sample vcf file, and you mean positions where the reference allele is the only one found? would you probably be looking for homozygous positions?
From my understanding Monomorphic SNPs : "A SNP for which a single form or allele can be identified in the population of interest". I aligned 100 bacterial genome (from different host ) to a reference by using Parsnp and called the snp . Now i would like to know how many monomorphic positions are shared across the host.
ok, so you need to know which snps have the same allele in all samples. first step clear.
now we would need to know wether you have a single multisample vcf, or multiple single sample vcfs.
single multisample vcf
then see my answer below