Question

variant calling - data from IonTorrent technology

0

Entering edit mode

6.2 years ago

anba • 0

Hi,

I have to perform variant calling with data obtained from Ion Torrent nextgen technology. I do not have access to the manufacturer's software (Torrent Server and their variant calling plugin).

I cannot find any clear information about how to perform this type of analysis using typical command line tools. Can I use just bwa to mapping and then bcftools of GATK to variant calling? Can you reccommand any pipelines or articles?

Best, anna

next-gen • 4.2k views

ADD COMMENT • link updated 6.2 years ago by Bastien Hervé 6.4k • written 6.2 years ago by anba • 0

0

Entering edit mode

You may want to check the docker hub from iontorrent

https://hub.docker.com/u/iontorrent

They have the latest software in containers and ready to use. You don't need to have access to a seperate server if you have plenty of desktop horsepower for this. Their documentation is also good.

ADD REPLY • link 6.2 years ago by GokalpC ▴ 170

0

Entering edit mode

Thank for your answer - I didn't know about kind of standalond version of iontorrent software. It really good to know for the future. But I have checked and for this exact project I cannot use manufacturer's software. I need to deal with the date using tools like bwa of GATK.

ADD REPLY • link 6.2 years ago by anba • 0

0

Entering edit mode

What species are you working on ? Are you looking for somatics, germlines or both mutation types ?

ADD REPLY • link 6.2 years ago by Bastien Hervé 6.4k

0

Entering edit mode

There are human samples, mostly from cancer tissue. Thus I want to find but germline and somatic mutations.

ADD REPLY • link 6.2 years ago by anba • 0

score 0 · Answer 1 · 2019-04-30

0

Entering edit mode

6.2 years ago

Bastien Hervé 6.4k

There are a lot of ways to do this, GATK is popular and good for somatic mutations but also over complicated at some steps. I still use it for somatic mutation discovery as discussed here : Best tool for variant calling

Commun steps are :

1) Check reads quality : fastQC/multiQC, fastp

2) Alignment : BWA performs well, can use Bowtie2 too

3) Remove the unmapped, low quality mapping, supplementary alignment, non primary, alternatives...

4) Check duplication level : Mark duplicates with Picard tools. Take a look at your duplicates ratio + take a look at the duplicate behaviour inside IGV. If you have cluster of duplicates on an exact same read it is probably PCR duplicate whereas if duplicates are not clustered and spread on multiple reads it is more probably biological duplicates. So, if you are in case one, better remove them, if you're in case two better keep them, in my opinion. If you are in amplicon you will have a lot of clustered duplicates due to the amplicon size so keep them.

5) Stuff before variant calling : Here it depends of the tool you use, using GATK you have a lot of steps to proceed before variant calling, like recalling bases, creating panel of normal, creating white list variants...

6) Calling variant : Read publication trying to find the tool you need with the informations you have, Mutect2, bcftools, haplotycaller, freebayes... I suggest you to try some of them pick some good quality variant and go check them by eyes in IGV, seen which tool as the best results on your data

7) Filtering VCF : Filter your vcf according to the filters you want (genotype, depth, ration ref/alt, LOD score, quality...)

8) Annotate VCF : You can annotate your vcf with different software and database like Annovar, SnpEff/SnpSift, Variant Effect Predictor from Ensembl...

ADD COMMENT • link 6.2 years ago by Bastien Hervé 6.4k

0

Entering edit mode

@Bastien Hervé - thank you for your answer. If you don't mind, I have few doubts

Ad 2. Mapping to the genome part. My data comes from IonProton machine (in IonTorrent technology) and this is AmpliSeq enriched library (set of about 100 genes). Can I use just bwa like I do for data generated by Illumina sequencers? Are there any specific parameters which should I use for mappig with Torrent data?

Ad.4. Level of duplicates marked with Picard is quite high. But I've found that this is typical for AmpliSeq libraries and this kind of duplicates shouldn't be removed (as a false positive results). What's your opinion?

Ad. 6. Variant calling - for Illumina data, GATK is my first choice. But for Torrent data (because of specific type of sequencing artifacts) according to the information from GATK website, this tool is not recommended. But I haven't found which is recommended - most often people use freebayes, but there is no clear explanation.

And - could you recommend any textbook concerning finding mutation form DNA sequencing (regerdeless technoogy of sequencing).

Once more - thanks a lot!

Best, anna

ADD REPLY • link 6.2 years ago by anba • 0

0

Entering edit mode

Ad2. I'd have align my reads with no specific option with BWA, take a look at the number of mapped reads. Then, have a look at the number of reads falling in your amplicons to have an idea of how many reads are falling outside (should be close to 0) using bedtools and your amplicon file as bed file

bedtools multicov -bams f1.bam f2.bam ... fn.bam -bed amplicons.bed > aln_per_amplicons.csv

Ad.4 As I said duplicates are expected in AmpliSeq libraries as your are not sequencing the whole genome but amplicons. With the same amount of read you have much more chance to sequence the same area in amplicon than in WGS. So keep them, they are not PCR duplicate but rather sequencing depth.

Ad.6 It is up to you really, try some, extract some good variant and confirm them in IGV. GATK is good for somatic mutation, as you want both somatic and germline I do not know if it is the best tool for you. If you want deeper information about variant detection tool feel free to read the papers, as Freebayes

Preaching for the parish : https://www.biostarhandbook.com/ got a part on variant calling :)