Question

Tool:MAGERI: a software tool for calling rare variants and detecting circulating tumor DNA from UMI-tagged high-throughput sequencing data

3

Entering edit mode

8.0 years ago

mikhail.shugay 3.5k

Dear Colleagues,

I would like to announce our recently published software tool called MAGERI that is designed to facilitate the detection of ultra-rare variants from various kinds of high-throughput sequencing datasets prepared using the molecular barcoding technology.

The ability to detect ultra-rare variants having ~0.1% frequency in the sample is one of the key objectives for successful circulating tumor DNA screening, studying rare tumor subpopulations and rare drug resistant variants in viral populations.

However, the sequencing error rate is far beyond the limit required for accurate rare variant calling even for sequencing datasets of top-tier quality. Recent development of the molecular barcoding technology allows eliminating sequencing errors by tagging each input molecule with an unique molecular identifier (UMI) [Marx. Nature Methods 2017]. UMI-tagged read groups can be then assembled into consensuses correcting sequencing errors. Still, residual PCR errors introduced at first PCR cycles and during UMI tag attachment can decrease the accuracy of variant calling. Moreover, (to the best of my knowledge) so far there is no dedicated variant caller that can model error rates in UMI-tagged read group consensus sequences. MAGERI software aims to solve this problem by implementing a consensus assembly, alignment and variant calling pipeline optimized for the UMI-tagged data [Shugay et al. Plos Comp Biol 2017].

Note that the datasets containing rare variants with known frequency and a control dataset from healthy donor plasma DNA are publicly available at SRA; see this repository for metadata and analysis scripts/templates. We hope that these benchmark datasets will be of use to the community, especially for the researchers developing software tools for UMI-tagged data processing and rare variant calling software.

rare-variant resequencing umi ctDNA • 5.3k views

ADD COMMENT • link updated 15 months ago by blid11 • 0 • written 8.0 years ago by mikhail.shugay 3.5k

0

Entering edit mode

Hi, I am new here and I am interested in MAGERI application to reanalyze my data carried out with a targeted panel by Ion Torrent Thermofisher which uses molecular tag to detect variant. I am not a bioinformatics expert and so I would be glad to find the simplest way to apply the tool. Could you suggest me all the steps necessary to make it work? Thank you very much. Giusy

ADD REPLY • link 6.4 years ago by giuseppa.deluca • 0

0

Entering edit mode

Hi Mikhail!

We're trying to run MAGERI on one of our cloud machines. However, it seems to fail on memory.

We're looking to upgrade the machine, what are the minimum and optimal system requirements for your software?

Thank you very much in advance!

Best, Alon

ADD REPLY • link 6.3 years ago by alons ▴ 270

1

Entering edit mode

Dear Alon,

The fact is that MAGERI is loading all reads into memory for consensus assembly, so we've tested it on a 64GB RAM servers for HiSeq analysis. The answer basically depends on the structure of your dataset: the number of reads, the number of unique UMI tags and reads-per-UMI distribution (e.g. if all your UMIs are uniformly covered is different from the situation with small coverage for most UMIs but having a single UMI with a million of associated reads). You can also give a try to our new MiNNN software (https://github.com/milaboratory/minnn) which is in a beta stage right now, if you have any question feel free to mail me (my userid at gmail dot com).

ADD REPLY • link 6.3 years ago by mikhail.shugay 3.5k

0

Entering edit mode

Thank you very much Mikhail, we will look into it!

ADD REPLY • link 6.3 years ago by alons ▴ 270

0

Entering edit mode

Hi Mikhail; I am doing a study to determine the frequency of rare SNPs in ctDNA. I am specifically looking at mutations in 4 genes. I am preparing the NGS libraries using the Takara SMARTer ThruPLEX Tag-Seq kit and using a probe based capture method to isolate the 8 exons of interest (~4.5kb total). I was hoping to use your software, but the library kit uses UMIs that consist of degenerate bases and I understand from Clement et al (2018) that Mageri doesn't allow degenerate bases in the UMIs. Is this correct? Many thanks for your help. Seanna

ADD REPLY • link 6.2 years ago by seanna.mctaggart • 0

0

Entering edit mode

Hi Mikhail!

I would like to use your tool to perform a somatic variant calling on cfDNA. However, I can only use a bam as input; instead of using both (tumour and normal bams). Do you know if it would be correct to perform the variant calling independently for both samples and then, in the downstream analysis, by selecting the variants not called in the normal bam? and maybe also selecting those variants whit a low allele frequency?

Thanks!

ADD REPLY • link 4.0 years ago by jeni ▴ 90

0

Entering edit mode

Also interested in trying this.

ADD REPLY • link 15 months ago by blid11 • 0