Converting SNP array variant data to VCF
2
2
Entering edit mode
7.8 years ago
jtftyl ▴ 20

Hello,

I just started my new job as a data programmer based in Seattle WA with the epidemiology group. I am primarily a SAS programmer. I have no experience when dealing with genomic data. Just a month into my job assignment, I have been asked to convert a text genomic file to VCF. ? There were many ideas bounced around by my coworkers and I have been asked to explore using R-bioinformatics, Plink, maybe other C++ libraries. I am not versed in any of these languages :( I googled many sites and came across a lot of information on how to read VCF but I haven't seen any information about creating the VCF files. Our genomic data are huge. I would like to convert our internal genomic text file using SAS and write the contents of the file in the format specified by VCF using this order: chrom, pos, id, ref, alt, qual, filter, info, format. However the data I received has the following columns: sample id, snp name, chr, position, allele1-top, allele2-top, x, y, r and b allele freq. The problem is I was able to map the one that are pretty straight-forward but I have no idea where the allele1-top, allele2-top, x, y, r, b allele freq map to. I am not a genomic expert and I am thinking of proposing to my supervisor that someone who has genomic experience should help me to do the mapping.

Another alternative might be to determine if it is even feasible to generate the VCF from our internal data.

Do you have any idea how I might proceed?

Regards Jenny

gene SNP R genome rna-seq • 5.1k views
ADD COMMENT
0
Entering edit mode

Hi Jenny,

This thread would benefit from a more descriptive title, more easily attracting people who can help you with what you need. "Variant Call Format (VCF)" is overly generic, I now modified your title to "Converting SNP array variant data to VCF", but you may wish to modify it further. Just try to be specific!

Cheers,
Wouter

ADD REPLY
0
Entering edit mode

sound good. whatever take to get the right answer.Thanks much!

ADD REPLY
1
Entering edit mode
7.6 years ago
aldoc ▴ 10

Explanations and tools to work with Illumina's TOP/BOT coding system can be found at:

https://www.illumina.com/documents/products/technotes/technote_topbot.pdf

gengen.openbioinformatics.org/en/latest/tutorial/coding/

Probably you can use R's crlmm package to read the final report files you have and convert X and Y intensities to genotype calls, or something like that. From there, you might move to VCF.

ADD COMMENT
0
Entering edit mode
7.8 years ago
bharata1803 ▴ 560

Have you checked samtools ? http://www.htslib.org/workflow/ It can give you vcf data from bam file. If your genomic data is obtained from fastq data, you can use the tools/

ADD COMMENT
2
Entering edit mode

allele1-top and allele2-top sounds like a SNP array to me, so probably no bam and fastq data.

ADD REPLY
0
Entering edit mode

I believe one of the genomic file is in BAM. Is the sametools free to download and easy to use?

ADD REPLY
0
Entering edit mode

Yes, it is free and you can read manual in the website. There are a practical example too.

ADD REPLY
0
Entering edit mode

Hi @jtftyl I have a similar problem to the one you tried to solve here months ago. I have a genotype report from affymetrix that I will like to convert to a VCF format. Were you able to generate the VCF files? If so, can you share how you did it. Thanks

ADD REPLY

Login before adding your answer.

Traffic: 2132 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6