Entering edit mode
6.3 years ago
hyanwong
▴
70
I'm creating a set of VCFs, one for each human chromosome, by taking a single VCF from e.g. ftp://ftp.ensembl.org/pub/release-91/variation/vcf/homo_sapiens/homo_sapiens.vcf.gz and running
bcftools view homo_sapiens.vcf.gz --regions 1 -Oz -o homo_sapiens_chr1.vcf.gz
I would like to store in each new VCF file the fact that this is the command I ran, and also that the original homo_sapiens.vcf.gz file was downloaded from that ensembl URL on a given date. I assume I should store this information in the ##source
line of the VCF, but is there any convention on how this should be stored. E.g. is there a structured (e.g. JSON) schema for saving this sort of provenance information?