Hello everyone,
I have only one microarray data which contains one .CEL file and another intensity .csv file. I'm searching for converting any of the two files into vcf format. After going through Biostars and Bioconductor forums, I found pd.genomewidesnp.6@getdb, crlmm and few others. crlmm doesn't work properly cause it needs bigger sample size and pd.genomewidesnp.6@getdb doesn't have a proper documentation or any scripts. Is there any way I can get the microarray data convert to vcf format?
Thanks,
Susmita
Yes i did the affy2vcf and APT, but those didn't work as I don't have the annotation file. Although apart from the .CEL file I do have a csv file of genotype calling of the .CEL file generated through BRLMM-P-Plus . So right now I want to convert that genotype call file into vcf format.
Which specific annotation file? APT should download it automatically, no?
I don't know. When I'm running the apt-format-result, its showing error that annotation file is missing It requires annotation file of the SNP0.6 Array i guess
You likely need one of these: https://www.thermofisher.com/order/catalog/product/901153?SID=srch-srp-901153
Please try to search a bit in the APT options where you can specify the annotation file. It has been a good few years since I last used APT.
So I won't be having that file cause I am analysing data available in GEO from another paper.
So I created an account and downloaded the annotation file. The command also ran but it showed Missing value in identifier column, SNP will be excluded for text export by calls file. Any idea why that could be.
Okay, getting somewhere... Could you try to use the
snp-identifier-column
parameter?Hello Kevin, I know it's been long time. I just got fed up with this and shifted to another project. Now I have come back to the same problem. I'm trying to do as you said but I'm still getting errors.
Any ideas how to proceed?
So I downloaded sqlite db and again tried. This time the vcf file is being created but with lots of warnings.
Okay, getting somewhere again... keep trying!
Yes, I understand the frustration, and I do not know to what those error message relate. It seems that at least one indicates that you don't have the 'snp-identifier-column' in the annotation, file, /home2/Project_2/Affy/GenomeWideSNP_6-na35- annot-csv/GenomeWideSNP_6.na35.annot.csv. This should be a column that identifies the genotype of each SNP, I imagine, like A, T, G, C.
Another option: export data using Affymetrix Genotyping Console (or Power Tools) in A, T, G, C format and then manually convert to VCF or PLINK (followed by exporting from PLIBK to VCF).
Further update is with the cel file using apt-probeset-genotype I did genotype calling and with those calls I used it to create vcf file using apt-format-result. But the vcf file that I'm getting is somewaht useless. Apparently I'm getting warning as the SNP is detected on the reverse strand and those should be on the forward strand. And moreover the vcf file is created without any REF/ALT or any QUAL or INFO
May have to include only those on forward strand. For importing microarray data to PLINK, for example, we filter out 1000s of SNPs because they were called on reverse strand. I also know that working with Affymetric data is cumbersome...
System ignores this parameter. I tried it that way and I get the error 'Database Error: no such column: Affy_SNP_ID'. I really don't know where the software gets this parameter, because I already changed the snp-identifier-column parameter.
Did you figure this out later?