Entering edit mode
4.5 years ago
kevinbruckner97
▴
20
Hello Biostars, It's my first post in here. I need to know, is there some way to convert .csv file, that I downloaded from TellmeGen , to .vcf file format? It looks pretty same, but .csv is comma-separated, and there is no header lines in it. How can I make it looks like classical vcf file? Thank you for your answers! The file looks like this:
No. 1 10:43597877_CNV_RET_e3_20 Chromosome 10 Position 43597877 Genotype GG.
I'm guessing you opened the CSV file in Excel and copy pasted the content here, because the content does not look like CSV - there are no commas.
The information content in that record is too low for it to be transformed into a well formed VCF file. Why are you looking to get a VCF anyway?
Yes, that is not a CSV, seems to be regular text, please clarify.
To generate a VCF you will need to parse the file and the Genome sequence used to do the variant call or array design, use the VCF format spec definition to create a translator.
A simple python script would do the job.
Can you elaborate? How can a "simple python script" add information that isn't there to get a VCF dataset from a CSV dataset without heavy customizations?
This is not a csv. Of course you need to have all the information required for format conversion including the Genome sequence.
So what's the reasoning behind your previous back-handed comment that ignores all the crucial aspects of the problem? A "simple" python script could do the job if the job were well defined, all data points required were available in an easy to parse format and OP knew how to code in Python and write VCF format output.
I mean, why even add a comment that adds no value to the post?