Entering edit mode
7.9 years ago
Ron
★
1.2k
Hi all,
I am using this software for RNAseq mutation analysis: https://github.com/davidliwei/rnaseqmut
My final output file is a VCF file and I want to convert it to a MAF file.
I have come across these posts,however the output VCF from RNAseq is a bit different in this case.
Vcf To Maf (Mutation Annotation Format) Conversion ?
chr1 877831 rs6672356;COSM4144217 T C 1.0 Sample_6385_11.DP4=0,0,1,2;Sample_G1.DP4=0,0,1,0;Sample_G2.DP4=0,0,0,0;Sample_G3.DP4=0,0,0,0;Sample_G4.DP4=0,0,0,0;Sample_G5.DP4=0,0,0,1;Sample_NY_D_TA23_PDX_T1.DP4=0,0,0,0;Sample_NY_D_TA23_PR.DP4=0,0,0,0;Sample_Pa_PDX.DP4=0,0,0,0;Sample_PA_primary.DP4=0,0,1,0;Sample_TE_PDX.DP4=0,0,0,0;Sample_TE_primary.DP4=0,0,0,0; ASP=true;GNO=true;HD=true;INT=true;KGPROD=true;KGPhase1=true;NSM=true;OTHERKG=true;REF=true;RS=6672356;RSPOS=877831;SAO=0;SLO=true;SSR=0;VC=SNV;VP=0x050100080a05000516000100;WGT=1;dbSNPBuildID=116;AA=p.W343R;CDS=c.1027T>C;CNT=23;GENE=SAMD11;SNP=true;STRAND=+;EFF=missense_variant(MODERATE|MISSENSE|Tgg/Cgg|p.Trp343Arg/c.1027T>C|681|SAMD11|protein_coding|CODING|ENST00000342066|10|1),missense_variant(MODERATE|MISSENSE|Tgg/Cgg|p.Trp250Arg/c.748T>C|588|SAMD11|protein_coding|CODING|ENST00000341065|8|1|WARNING_TRANSCRIPT_NO_START_CODON),missense_variant(MODERATE|MISSENSE|Tgg/Cgg|p.Trp169Arg/c.505T>C|540|SAMD11|protein_coding|CODING|ENST00000455979|4|1|WARNING_TRANSCRIPT_NO_START_CODON),downstream_gene_variant(MODIFIER||1753||749|NOC2L|protein_coding|CODING|ENST00000327044||1),downstream_gene_variant(MODIFIER||1753|||NOC2L|retained_intron|CODING|ENST00000483767||1),downstream_gene_variant(MODIFIER||1754|||NOC2L|retained_intron|CODING|ENST00000477976||1),downstream_gene_variant(MODIFIER||3160||178|SAMD11|protein_coding|CODING|ENST00000420190||1),downstream_gene_variant(MODIFIER||2868|||NOC2L|processed_transcript|CODING|ENST00000496938||1),downstream_gene_variant(MODIFIER||278|||SAMD11|processed_transcript|CODING|ENST00000478729||1),non_coding_exon_variant(MODIFIER|||n.286T>C||SAMD11|retained_intron|CODING|ENST00000464948|1|1),non_coding_exon_variant(MODIFIER|||n.389T>C||SAMD11|retained_intron|CODING|ENST0000474461|3|1),non_coding_exon_variant(MODIFIER|||n.191T>C||SAMD11|retained_intron|CODING|ENST00000466827|2|1)
chr1 878314 rs142558220;COSM426784 G C 1.0 Sample_6385_11.DP4=5,3,5,4;Sample_G1.DP4=0,0,0,0;Sample_G2.DP4=0,0,0,0;Sample_G3.DP4=1,0,0,0;Sample_G4.DP4=1,0,0,0;Sample_G5.DP4=2,1,0,0;Sample_NY_D_TA23_PDX_T1.DP4=2,3,0,0;Sample_NY_D_TA23_PR.DP4=0,1,0,0;Sample_Pa_PDX.DP4=2,1,0,0;Sample_PA_primary.DP4=1,0,0,0;Sample_TE_PDX.DP4=0,0,0,0;Sample_TE_primary.DP4=0,0,0,0;ASP=true;INT=true;KGPROD=true;KGPhase1=true;OTHERKG=true;REF=true;RS=142558220;RSPOS=878314;SAO=0;SSR=0;SYN=true;VC=SNV;VP=0x050000080305100016000100;WGT=1;dbSNPBuildID=134;AA=p.G480G;CDS=c.1440G>C;CNT=2;GENE=SAMD11;SNP=true;STRAND=+;EFF=synonymous_variant(LOW|SILENT|ggG/ggC|p.Gly480Gly/c.1440G>C|681|SAMD11|protein_coding|CODING|ENST00000342066|11|1),synonymous_variant(LOW|SILENT|ggG/ggC|p.Gly387Gly/c.1161G>C|588|SAMD11|protein_coding|CODING|ENST00000341065|9|1|WARNING_TRANSCRIPT_NO_START_CODON),synonymous_variant(LOW|SILENT|ggG/ggC|p.Gly306Gly/c.918G>C|540|SAMD11|protein_coding|CODING|ENST00000455979|5|1|WARNING_TRANSCRIPT_NO_START_CODON),downstream_gene_variant(MODIFIER||1270||749|NOC2L|protein_coding|CODING|ENST00000327044||1),downstream_gene_variant(MODIFIER||1270|||NOC2L|retained_intron|CODING|ENST00000483767||1),downstream_gene_variant(MODIFIER||1271|||NOC2L|retained_intron|CODING|ENST00000477976||1),downstream_gene_variant(MODIFIER||3643||178|SAMD11|protein_coding|CODING|ENST00000420190||1),downstream_gene_variant(MODIFIER||2385|||NOC2L|processed_transcript|CODING|ENST00000496938||1),downstream_gene_variant(MODIFIER||761|||SAMD11|processed_transcript|CODING|ENST00000478729||1),downstream_gene_variant(MODIFIER||132|||SAMD11|retained_intron|CODING|ENST00000466827||1),downstream_gene_variant(MODIFIER||42|||SAMD11|retained_intron|CODING|ENST00000464948||1),non_coding_exon_variant(MODIFIER|||n.802G>C||SAMD11|retained_intron|CODING|ENST00000474461|4|1)
for example the annotation column has each Sample with 4 values namely reference allele reads and alternate allele reads Sample_6385_11.DP4=0,0,1,2
Any suggestions on how to get this data to a format like this?
Chromosome Start_position End_position Strand Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 dbSNP_RS Tumor_Sample_Barcode Matched_Norm_Sample_Barcode n_alt_count n_ref_count t_alt_count t_ref_count amino_acid_change_WU
X 47044502 47044502 + Nonsense_Mutation SNP G G T novel UTUC123_1 0 96 28 48 p.E667*
2 192701329 192701329 + Missense_Mutation SNP C C T novel UTUC123_1 0 81 18 49 p.V200M
5 112824048 112824048 + In_Frame_Ins INS - #NAME? #NAME? novel UTUC123_1 0 17 7 14 p.S22_nofs
11 62286810 62286810 + Missense_Mutation SNP T T C novel UTUC123_1 0 111 25 72 p.K5027E
Thanks, Ron