I have just run a series of imputation scripts on a server and it has given me 23 chromosome files, each individual with its own VCF file. According to Sanger's rules:
For VCF, we require:
Valid VCF
All alleles on the forward strand
Coordinates are on GRCh37
REF allele matches GRCh37. See the resources for help checking and fixing the REF allele.
A single VCF file, not one file per-chromosome
Records are sorted by genomic position (chromosomal order is not important)
Chromosome names should be 1, 2, 3, etc… not chr1, chr2, chr3, etc… They should match the names in this reference index file. Some programs will represent X as 23, Y as 24, etc…. Please remove or replace these names. See the resources for help renaming chromosomes in a VCF.
If not requesting pre-phasing, then all sites and samples should be phased with no missing data.
How abouts would I proceed? All these VCF files are individualistic and I'm not sure what the coordinate's are on GRCh37 mean. Really new to this process/first timer. Many thanks.
Hi, did you manage to figure this out? I am stuck with the same problem.