Hello,
I just started my new job as a data programmer based in Seattle WA with the epidemiology group. I am primarily a SAS programmer. I have no experience when dealing with genomic data. Just a month into my job assignment, I have been asked to convert a text genomic file to VCF. ? There were many ideas bounced around by my coworkers and I have been asked to explore using R-bioinformatics, Plink, maybe other C++ libraries. I am not versed in any of these languages :( I googled many sites and came across a lot of information on how to read VCF but I haven't seen any information about creating the VCF files. Our genomic data are huge. I would like to convert our internal genomic text file using SAS and write the contents of the file in the format specified by VCF using this order: chrom, pos, id, ref, alt, qual, filter, info, format. However the data I received has the following columns: sample id, snp name, chr, position, allele1-top, allele2-top, x, y, r and b allele freq. The problem is I was able to map the one that are pretty straight-forward but I have no idea where the allele1-top, allele2-top, x, y, r, b allele freq map to. I am not a genomic expert and I am thinking of proposing to my supervisor that someone who has genomic experience should help me to do the mapping.
Another alternative might be to determine if it is even feasible to generate the VCF from our internal data.
Do you have any idea how I might proceed?
Regards Jenny
Hi Jenny,
This thread would benefit from a more descriptive title, more easily attracting people who can help you with what you need. "Variant Call Format (VCF)" is overly generic, I now modified your title to "Converting SNP array variant data to VCF", but you may wish to modify it further. Just try to be specific!
Cheers,
Wouter
sound good. whatever take to get the right answer.Thanks much!