Hi,
I am trying to use genome music to generate smg genes in a normal-tumor paired WES data. The workflow is:
1 using Verscan to get variant in vcf format
2 using Vcf2maf to transfer the vcf output to maf
3 bmr calc-covg
4 bmr calc-bmr
5 smg
It seems that the 1-4 steps runs properly but there is an error when I try the last step:
*Result/Cat_Group/Music/APPF008/Result/6/roi_covgs/APPF008.covg generated and stored.
APPF008.covg generated and stored to*Result/Cat_Group/Music/APPF008/Result/6/roi_covgs.
STATUS: Running VEP and writing to: /home/suozhen/data/RNA-*Result/Cat_Group/Music/APPF008/Input/VCF/6.APPF008.trans.vep.vcf
2017-02-24 15:15:52 - Read existing cache info
2017-02-24 15:15:52 - Starting...
2017-02-24 15:15:54 - Read 5000 variants into buffer
2017-02-24 15:15:54 - Calculating consequences
2017-02-24 15:16:00 - Writing output
2017-02-24 15:16:00 - Processed 5000 total variants (625 vars/sec, 625 vars/sec total)
2017-02-24 15:16:03 - Read 5000 variants into buffer
2017-02-24 15:16:03 - Calculating consequences
2017-02-24 15:16:09 - Writing output
2017-02-24 15:16:09 - Processed 10000 total variants (556 vars/sec, 588 vars/sec total)
2017-02-24 15:16:11 - Read 5000 variants into buffer
2017-02-24 15:16:11 - Calculating consequences
2017-02-24 15:16:17 - Writing output
2017-02-24 15:16:17 - Processed 15000 total variants (625 vars/sec, 600 vars/sec total)
2017-02-24 15:16:19 - Read 5000 variants into buffer
2017-02-24 15:16:19 - Calculating consequences
2017-02-24 15:16:26 - Writing output
2017-02-24 15:16:26 - Processed 20000 total variants (556 vars/sec, 588 vars/sec total)
2017-02-24 15:16:27 - Read 2156 variants into buffer
2017-02-24 15:16:27 - Calculating consequences
2017-02-24 15:16:32 - Writing output
2017-02-24 15:16:32 - Processed 22156 total variants (359 vars/sec, 554 vars/sec total)
2017-02-24 15:16:32 - Finished!
Loading per-sample coverages stored in *Result/Cat_Group/Music/APPF008/Result/6/total_covgs
Loading per-gene coverage files stored under *Result/Cat_Group/Music/APPF008/Result/6/gene_covgs/
Running 'joinx ref-stats' to read reference FASTA and identify SNVs at AT, CG, CpG sites
Parsing MAF file to classify mutations
Finished Parsing the MAF file to classify mutations
Skipped 1054 mutation(s) that belong to unrecognized samples
Error in xy.coords(x, y, xlabel, ylabel, log) :
'x' and 'y' lengths differ
Calls: plot -> plot.default -> xy.coords
stop running
Error in xy.coords(x, y, xlabel, ylabel, log) :
'x' and 'y' lengths differ
Calls: plot -> plot.default -> xy.coords
stop running
No result is written to the output smgs_varscan_tumor, and another output smgs_varscan_tumor_detailed looks like:
Gene Indels SNVs Tot Muts Covd Bps Muts pMbp P-value FCPT P-value LRT P-value CT FDR FCPT FDR LRT FDR CT Expression
CYP21A1P 0 0 0 1264 0.00 1 1 1 1 1 1 expressed
MAPK14 0 0 0 2632 0.00 1 1 1 1 1 1 expressed
TAF8 0 0 0 2421 0.00 1 1 1 1 1 1 expressed
TEAD3 0 0 0 3096 0.00 1 1 1 1 1 1 expressed
TRIM31 0 0 0 1725 0.00 1 1 1 1 1 1 expressed
The mutation information in maf file is not read by genome music smg. Can someone please help? Thank you so much!!
----------------the maf header looks like:
version 2.4
Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_Position End_Position Strand Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 dbSNP_RS dbSNP_Val_Status Tumor_Sample_Barcode Matched_Norm_Sample_Barcode Match_Norm_Seq_Allele1 Match_Norm_Seq_Allele2 Tumor_Validation_Allele1 Tumor_Validation_Allele2 Match_Norm_Validation_Allele1 Match_Norm_Validation_Allele2 Verification_Status Validation_Status Mutation_Status Sequencing_Phase Sequence_Source Validation_Method Score BAM_File Sequencer Tumor_Sample_UUID Matched_Norm_Sample_UUID HGVSc HGVSp HGVSp_Short Transcript_ID Exon_Number t_depth t_ref_count t_alt_count n_depth n_ref_count n_alt_count all_effects Allele Gene Feature Feature_type Consequence cDNA_position CDS_position Protein_position Amino_acids Codons Existing_variation ALLELE_NUM DISTANCE STRAND_VEP SYMBOL SYMBOL_SOURCE HGNC_ID BIOTYPE CANONICAL CCDS ENSP SWISSPROT TREMBL UNIPARC RefSeq SIFT PolyPhen EXON INTRON DOMAINS GMAF AFR_MAF AMR_MAF ASN_MAF EAS_MAF EUR_MAF SAS_MAF AA_MAF EA_MAF CLIN_SIG SOMATIC PUBMED MOTIF_NAME MOTIF_POS HIGH_INF_POS MOTIF_SCORE_CHANGE IMPACT PICK VARIANT_CLASS TSL HGVS_OFFSET PHENO MINIMISED ExAC_AF ExAC_AF_AFR ExAC_AF_AMR ExAC_AF_EAS ExAC_AF_FIN ExAC_AF_NFE ExAC_AF_OTH ExAC_AF_SAS GENE_PHENO FILTER flanking_bps variant_id variant_qual ExAC_AF_Adj ExAC_AC_AN_Adj ExAC_AC_AN ExAC_AC_AN_AFR ExAC_AC_AN_AMR ExAC_AC_AN_EAS ExAC_AC_AN_FIN ExAC_AC_AN_NFE ExAC_AC_AN_OTH ExAC_AC_AN_SAS ExAC_FILTER
-----------ROI------------------------------------------
6 105929 106835 OR4F1P
6 292465 292642 DUSP22
------------music command-----------------------------
1
genome music bmr calc-covg --bam-list */6.APPF008.bam.list --output-dir */Result/6 --reference-sequence */6.fasta --roi-file */6.bed --gene-covg-dir */6
2
*/vcf2maf-master/vcf2maf.pl --input-vcf */6.APPF008.trans --output-maf */6.APPF008.trans.maf --tumor-id 'TUMOR' --normal-id 'NORMAL' --vcf-tumor-id 'TUMOR' --vcf-normal-id 'NORMAL' --vep-path */Music/data_vep --filter-vcf */Music/data_vep/ExAC_nonTCGA.r0.3.1.sites.vep.vcf.gz --ref-fasta */Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
3
*/Music/MuSiC2-master/bin/music2 bmr calc-bmr --bam-list */6.APPF008.bam.list -maf-file */6.APPF008.trans.maf --reference-sequence */6.fasta --roi-file */6.bed --output-dir */6
4
*/Music/MuSiC2-master/bin/music2 smg --gene-mr-file */Result/6/gene_mrs --output-file */Result/6/smgs_varscan_tumor