Ok, I found the answer by looking at the help page for the path-scan function
--pathway-file
This is a tab-delimited file prepared from a pathway database (such as KEGG), with the columns: [pathid, pathname, class, geneline, diseases, drugs, description] The latter three columns are optional (but are available on KEGG). The geneline contains the "entrezid:genename" of all genes involved in this pathway, each separated by a "|" symbol.
For example, a line in the pathway-file would look like:
Ensure that the gene names and entrez IDs used match those used in the MAF file. Entrez IDs are not mandatory (use a 0 if Entrez ID unknown). But if a gene name in the MAF does not match any gene name in this file, the entrez IDs are used to find a match (unless it's a 0).
It doesn't really say how to "prepare" such a file though?
There are more information in the man genome music page:
The MuSiC suite is a set of tools aimed at discovering the significance of somatic mutations found within a given cohort of cancer samples, and with respect to a variety of external data sources. The standard inputs required are:
1. mapped reads in BAM format
2. predicted or validated SNVs or indels in mutation annotation format (MAF)
3. a list of regions of interest (typically the boundaries of coding exons)
4. any relevant numeric or categorical clinical data.
The formats for inputs 3. and 4. are:
3. Regions of Interest File:
· Do not use headers
· 4 columns, which are [chromosome start-position(1-based) stop-position(1-based) gene_name]
4. Clinical Data Files:
· Headers are required
· At least 1 sample_id column and 1 attribute column, with the format being [sample_id clinical_data_attribute clinical_data_attribute ...]
· The sample_id must match the sample_id listed in the MAF under "Tumor_Sample_Barcode" for relating the mutations of this sample.
· The header for each clinical_data_attribute will appear in the output file to denote relationships with the mutation data from the MAF.
Descriptions for the usage of each tool (each sub-command) can be found separately.
The play command runs all of the sub-commands serially on a selected input set.
According to the description, the pathway file should be something like:
Hi mark.dunning
Can you please share how you prepared the --pathway-file.