Hello,
I have used samtools mpileup, and bcftools call/filter (v1.5) to call SNPs between 2 samples and a de novo reference transcriptome.
Here is my pipeline:
bcftools mpileup -Ou -f ref.fa aln.bam1 aln.bam2 | \
bcftools call -Ou -mv | \
bcftools filter -s LowQual -e '%QUAL<20 || DP>100' > var.flt.vcf
I'm interested in extracting SNPs that occur between individual samples (aln.bam1 and aln.bam2) , and not between the samples and the reference (aln.bam 1 and ref.fa, aln.bam2 and ref.fa) . I would like to see how different each sample is different, or similar, from one another. Is this possible? I haven't been able to find much documentation on how one would approach this.
Any info would be greatly appreciated.
Cheers.
Hello michbrown!
It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=76923
This is typically not recommended as it runs the risk of annoying people in both communities.
Thanks for your reply, Pierre. Currently running vcffilterjs now. Would the command be the same as the examples posted above, if there were more than 2 samples? For example, if there were 4 samples?
Thanks again.
of course not. But it's not clear from your question how you would filter out a VCF with N-Samples. However vcffilterjs can run some loops over the genotypes. see the example '''Sample having a unique genotype:'''
My mistake, I'm quite new to bioinformatics. Thanks for your help so far. I've been running
for ~24 hours now on a filtered .vcf file. What run times should I expect with the vcfffiltersjs program?
Thanks again
if the vcf input is huge, the processing is slow. you should see an estimation of the remaining time: something like:
if not something is wrong.
Check your have really provided a input vcf, because vcfilterjs might try to wait for a VCF provided on standard input.
A faster alternative is my new tool: http://lindenb.github.io/jvarkit/VcfFilterJdk.html
Thanks for getting back to me, this is a big help. The tool that you've copied above will return sample [0] not same genotype as sample [1]?
no the SAME genotype
for a different , that would be
note the
'!'
Thank you so much, Pierre! I appreciate the help and insight.
Again, forgive me for my inexperience. I would like to output the results to a .txt file. Is this as simple as adding '-o somefile.txt' argument to the command? I tried to add an output argument but received an error:
java -jar /scratch/$USER/jvarkit/dist/vcffilterjdk.jar -e 'return !variant.getGenotype(0).sameGenotype(variant.getGenotype(1));' /scratch/$USER/SNPs/NFS_SFS_alignment.var.flt.vcf -o vcffilterjdk_different_genotype_NFS_SFS.txt
[SEVERE][Launcher]Must specify file or stream output type. java.lang.IllegalArgumentException: Must specify file or stream output type. at htsjdk.variant.variantcontext.writer.VariantContextWriterBuilder.build(VariantContextWriterBuilder.java:423) at com.github.lindenb.jvarkit.util.vcf.VCFUtils.createVariantContextWriter(VCFUtils.java:391) at com.github.lindenb.jvarkit.util.jcommander.Launcher.openVariantContextWriter(Launcher.java:946) at com.github.lindenb.jvarkit.util.jcommander.Launcher.doVcfToVcf(Launcher.java:981) at com.github.lindenb.jvarkit.util.jcommander.Launcher.doVcfToVcf(Launcher.java:1002) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.doWork(VcfFilterJdk.java:322) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:1155) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:1341) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.main(VcfFilterJdk.java:336) [INFO][Launcher]vcffilterjdk Exited with failure (-1)
Thanks again for guiding me through this.