Hi all, how can I counting SNPs and INDELs per each chromosome in a vcf file?
Hi all, how can I counting SNPs and INDELs per each chromosome in a vcf file?
Using BEDOPS vcf2bed
and standard Unix tools like wc
to count lines:
$ vcf2bed --snvs < variants.vcf | wc -l
$ vcf2bed --insertions < variants.vcf | wc -l
$ vcf2bed --deletions < variants.vcf | wc -l
Cf. http://bedops.readthedocs.io/en/latest/content/reference/file-management/conversion/vcf2bed.html
The above does counts over all chromosomes. You need to do an extra step to count per chromosome.
To perform this count exercise quickly per-chromosome, per-variant class:
$ vcf2bed --snvs < variants.vcf > variants.snvs.bed
$ for chr in `bedextract --list-chr variants.snvs.bed`; do echo $chr; bedextract $chr variants.snvs.bed | wc -l; done
The bedextract
tool is part of BEDOPS. If you install BEDOPS to get vcf2bed
, you get bedextract
, as well.
Repeat this procedure for --insertions
and --deletions
, replacing snvs
accordingly, to count those classes.
using bioalcidaejdk: http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html
$ java -jar dist/bioalcidaejdk.jar -e 'stream().map(V->V.getType().name()+" "+V.getContig()).collect(Collectors.groupingBy(Function.identity(), Collectors.counting())).forEach((K,V)->{println(K+" : "+V);});' input.vcf
SNP 1 : 44
INDEL 1 : 6
there is no INDEL in your VCF, in the htsjdk context : https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/variant/variantcontext/VariantContext.html#getType-- ; https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/variant/variantcontext/VariantContext.Type.html ;
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
it is correct. I humbly thank you,
I run these commands and I have number of SNP per chromosomes furthermore I have many 'Un' (I think this is unknown chromosomes SNP). how can I split or cut mt,x and unknown chromosomes from vcf?
Use
grep
on the output ofbedextract --list-chr
. You can passgrep
a file containing chromosome names you want to include (or, conversely, exclude) viagrep -f
.Run
man grep
to see a full list of options.many thanks but I want cut mt, unknown and X chromosomes SNPs from a vcf file.
I means, my prior vcf have many SNPs that these SNPs are in mitochondrial DNA, X chromosome and unknown chromosomes. I want to have a vcf file without mitochondrial , X chromosome and unknown chromosomes SNPs information.
You could use
grep
in the same way. It may help to review some examples of how it works so you can understand Unix streams and howgrep
can be used here.Hi, I also tried using this script but gave me only the number of snvs for each chromosome and did not count the insertions and deletions??
In fact, the outputs related to insertions and deletions are empty and there is no data in it?
Are you sure you have insertions/deletions?
yes, Because when I use this command, i find 182456 deletions and 0 insertions.
insertions:
insertions: