Hello everybody, like the title: which is the way to merge VCFs files containing CNV?
I use vcf-merge (https://vcftools.github.io/perl_module.html#vcf-merge), a VCFtools function, and after bgzip and tabix (http://www.htslib.org/doc/tabix.html), SAMtools, to indexing and tab separating variants, but I don't know if it is the right way. Thanks.
Why would a VCF contain CNV? / Perché un VCF contenesse CNV? /
A copy number variant is generally defined as a region > 1 kilobase. VCF files should only contain single nucleotide variants and short insertions/deletions (InDels). If you have a customised VCF format, then the standard tools used to manipulate VCF files may not understand how to interpret your custom format. In that case, you could use Python scripts to manipulate your VCF files. I have done this recently to include copy number variants in my custom VCF format.
Kevin
Structural variants, including CNVs, have been allowed in VCFs according to the specification for quite some time. You can get into adding custom tags for ease of use (like SVEND is often used to give easy reference to stop coordinates). While it may well be that a caller may be adding custom fields to the INFO line it is incorrect to say that only short Indels and SNVs should be in a VCF file.