Entering edit mode
7 months ago
a.beggs
▴
60
Hi all
I have a VCF file with the following lines:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
chr17 23197000 Spectre.DEL.7ROFFYQK N LOSS . . END=25683000;SVLEN=2486000;SVTYPE=LOSS;CN=0 GT:HO:GQ 1/1:0.0:60
chr18 19357000 Spectre.DEL.8B1N5YFJ N LOSS . . END=20560000;SVLEN=1203000;SVTYPE=LOSS;CN=0 GT:HO:GQ 1/1:0.0:60
chr1_KI270709v1_random 2000 Spectre.DUP.Y9R4QQKP N GAIN . . END=18000;SVLEN=16000;SVTYPE=GAIN;CN=42 GT:HO:GQ ./.:0.0:60
chr2_KI270715v1_random 143000 Spectre.DUP.7IRZ6XDF N GAIN . . END=160000;SVLEN=17000;SVTYPE=GAIN;CN=5 GT:HO:GQ ./.:0.0:60
chr9_KI270719v1_random 137000 Spectre.DUP.YC1FK3L0 N GAIN . . END=173000;SVLEN=36000;SVTYPE=GAIN;CN=4 GT:HO:GQ ./.:0.0:60
chr11_KI270721v1_random 5000 Spectre.DUP.YB0LB1EU N GAIN . . END=18000;SVLEN=13000;SVTYPE=GAIN;CN=4 GT:HO:GQ ./.:0.0:60
For various reasons the tertiary analysis pipeline I am feeding the VCF into is extremely fussy about its input. It wants:
- SVTYPE has to be CNV
- ALT allele needs to be <CNV>
- the ID field is used to determine if it is LOSS or GAIN so needs to include this text
- FORMAT/CN field is required for copy number
I have tried pyVCF, BCFtools and awk to convert it to look like this but can't seem to make it work... has anyone the VCF wizadary to give me any pointers please? The main issue is getting the CN from INFO to FORMAT, and adding the ID field to have LOSS/GAIN
Do you mean like the following example diff:
You also need to modify the header.
What kind of pipeline is this?
Yeah that's what I'm looking for, awk can take care of the header for me... are you saying diff can do this?!
That should be easy to script in perl. diff was just my way of defining the changes to be made to your file.
so this is your real question; Show us the code.