I'm currently using SnpSift via Git Bash to extract columns from a GWAS dataset.
I've successfully done so using the following command line:
java -jar SnpSift.jar extractfields inputfile.vcf CHROM POS ID REF ALT "Sample 1" "Sample 2"...."Sample n" > final_output.vcf
Sample 1, Sample 2, etc. represent the sample IDs for each patient in the study. So, each column represents a patient, and each row represents a SNP.
The genotype in the original file shows as 0/0, 0/1, 1/1. However, following extraction, the genotypes appear as ascending numerical values. So it appeared as 2, 3, 12, etc.
I've had a look at SnpSift's manual regarding extractions. They used a genotype example of 0|0. I've attempted to making this modification to my dataset using the simple find and replace function (I am using 010 Editor, as it's a 3 GB file). However, due to its size, after a while the program crashed.
Does anyone have any suggestions they can throw my way? Appreciate any help.