MutScan (https://github.com/OpenGene/MutScan)
- Ultra sensitive
- 20X+ faster than normal pipeline (i.e. BWA + Samtools + GATK/VarScan/Mutect)
- Very easy to use. Need nothing else. No alignment, no reference assembly, no variant call, no pileup...
- Beautiful HTML report
- Multi-threading support
- Support both single-end and pair-end data
- For pair-end data, MutScan will try to merge each pair, and do quality adjustment and error correction
Download
# download use http
https://github.com/OpenGene/MutScan/archive/master.zip
# or download use git
git clone https://github.com/OpenGene/MutScan.git
Build
cd MutScan
make
Usage
usage: mutscan -1 <read1_file> -2 <read2_file> -m <mutation_file> -h <html_report_file> -t <thread>
options:
-1, --read1 read1 file name (string)
-2, --read2 read2 file name (string)
-m, --mutation optional, mutation file name (string)
-h, --html optional, filename of html report, no html report if not specified (string)
-?, --help print this message
-t, --thread thread number, default 4 (int)
The plain text result, contains the detected mutations and their support reads, will be printed directly. You can use >
to redirect output to a file, like:
mutscan -1 <read1_file_name> -2 <read2_file_name> -m <mutation_file_name> > result.txt
And you can make a HTML file report with -h
argument, like:
mutscan -1 <read1_file_name> -2 <read2_file_name> -m <mutation_file_name> -h report.html
single-end and pair-end
For single-end sequencing data, -2
argument is omitted:
mutscan -1 <read1_file_name> -m <mutation_file_name>
Mutation file
A CSV file with columns of name
, left_seq_of_mutation_point
, mutation_seq
and right_seq_of_mutation_point
#name, left_seq_of_mutation_point, mutation_seq, right_seq_of_mutation_point
NRAS-neg-1-115258748-2-c.34G>A-p.G12S-COSM563, GGATTGTCAGTGCGCTTTTCCCAACACCAC, T, TGCTCCAACCACCACCAGTTTGTACTCAGT
NRAS-neg-1-115252203-2-c.437C>T-p.A146V-COSM4170228, TGAAAGCTGTACCATACCTGTCTGGTCTTG, A, CTGAGGTTTCAATGAATGGAATCCCGTAAC
BRAF-neg-7-140453136-15-c.1799T>A -V600E-COSM476, AACTGATGGGACCCACTCCATCGAGATTTC, T, CTGTAGCTAGACCAAAATCACCTATTTTTA
EGFR-pos-7-55241677-18-c.2125G>A-p.E709K-COSM12988, CCCAACCAAGCTCTCTTGAGGATCTTGAAG, A, AAACTGAATTCAAAAAGATCAAAGTGCTGG
EGFR-pos-7-55241707-18-c.2155G>A-p.G719S-COSM6252, GAAACTGAATTCAAAAAGATCAAAGTGCTG, A, GCTCCGGTGCGTTCGGCACGGTGTATAAGG
EGFR-pos-7-55241707-18-c.2155G>T-p.G719C-COSM6253, GAAACTGAATTCAAAAAGATCAAAGTGCTG, T, GCTCCGGTGCGTTCGGCACGGTGTATAAGG
A default CSV file contains important actionable cancer gene targets is already provided in mutation/cancer.csv
. If you want to use this mutation file directly, the argument mutation_file_name
can be omitted:
mutscan -1 <read1_file_name> -2 <read2_file_name>
HTML output
If -h
or --html
argument is given, then a HTML report will be generated, and written to the given filename. A sample report is given here:
The color of each base indicates its quality, and the quality will be shown when mouse over.
Cool. How does this work? Are you doing some kind of fuzzy k-mer alignment to target genes?
Yes. Basically this is an implementation of sequence string searching algorithm. But with support of error tolerance, quality handling and other sequence related features.
Am I correct in assuming that this tool is designed for human samples only?
No, you can specify any sequence in the mutation list CSV file.
Can we use this for RNAseq data?
Sure, it is just sequence. But protein sequence is not supported yet.
How open this program? I don't understand. Help me please.