Entering edit mode
16 months ago
ymberzal
•
0
Hello
I have ddRAD data of 36 cattle that I made a vcf file out of using samtools sort, bcftools mpileup and call. The number of SNPs I am getting is very high (in millions). I referred to other articles where ddRAD data was analysed using a similar pipeline but everyone's SNPs were in thousands.
I have checked every step of my pipeline and it looks right. But why are my SNPs three times greater than what everyone else is getting using a similar pipeline?
Please help.
Did they really all use the identical pipeline, or did they maybe use some variant filtration after that? Did they not maybe use GATK or another variant caller instead?
Thanks for your reply. Turns out the difference was due to different sets of restriction enzyme used in other studies.