Question

Mask or trim primer sequences in Amplicon sequencing

6

Entering edit mode

8.0 years ago

Paul ★ 1.5k

Dear all,

according to nice article from Multiplicom HERE - page 27 down. What is the procedure how to masking primer sequencing from my amplicons sequencing? Should I mask it? Should I trim it? Can anyone to share experiences and some workflows (tools)?

enter image description here

Thank anyone to sharing your experiences!

amplicon mask trimming fastq • 6.7k views

ADD COMMENT • link 8.0 years ago by Paul ★ 1.5k

1

Entering edit mode

In the example of the picture you gave, trimming seems to be particularly important, because primer sequence overlaps genomic sequence due to multiple partially-overlapping primer sets being multiplexed together. Is that common, and is it the case in your experiment? Still, while trimming seems to be useful in the diagram, it's not exactly a panacea, when you consider the case of linked SNPs, one in a primer and one not. The point of trimming the primer is to reduce bias and non-genomic sequence, but bear in mind that it won't eliminate bias, even though the picture makes it seem like it would.

As for masking vs trimming, that depends on how your downstream analysis handles masked bases. Trimming is usually better.

ADD REPLY • link 8.0 years ago by Brian Bushnell 20k

0

Entering edit mode

Thank you for comment, Would you recommend any worflow and tools how to right trim / mask those sequences? We have 80 views and nobody have any other experiences or additional information?

ADD REPLY • link 8.0 years ago by Paul ★ 1.5k

3

Entering edit mode

BBMap suite written by Brian Bushnell has the tools you need. Look at bbduk.sh for trimming and bbmask.sh for masking. There are other tools that can align/pileup etc.

ADD REPLY • link 8.0 years ago by GenoMax 148k

0

Entering edit mode

Does it make a sense to trim each amplicom -30 bp separately for forward and reverse orientation?

ADD REPLY • link 7.9 years ago by Paul ★ 1.5k

0

Entering edit mode

If the primers are 30bp long, it makes sense to trim 30bp from the start of all reads. With BBDuk the command would be (assuming you have paired-end reads like in the diagram):

bbduk.sh in1=r1.fq in2=r2.fq out1=trimmed1.fq out2=trimmed2.fq forcetrimleft=30

ADD REPLY • link 7.9 years ago by Brian Bushnell 20k

0

Entering edit mode

The directions in the document you attached seem to be for trimming. Since you are following an established protocol why not stick with the recommendations?

ADD REPLY • link 8.0 years ago by GenoMax 148k

1

Entering edit mode

Because I do not use any of the commercial softwares like JSI or SeqNext. I would like to also avoid MSR. I think lot of people doing amplicon sequencing and I would like to know their attitude to this problem. Maybe someone can share its workflow.

ADD REPLY • link 8.0 years ago by Paul ★ 1.5k

0

Entering edit mode

For nested amplicon sequencing, we first align the original reads (having primer sequences) to reference and then mask the primers by soft-clipping the alignments using BAMClipper (Scientific Reports 7:1567). As mentioned in another thread, primer trimming at FASTQ level (1) is computationally expensive, (2) incorrectly handles nested PCR amplicons, (3) makes indels harder to detect by conventional variant calling.

ADD REPLY • link 7.7 years ago by Tommy Au ▴ 70

0

Entering edit mode

As mentioned in another thread, primer trimming at FASTQ level (1) is computationally expensive

No, it's not. It's one of the computationally cheapest operations you can do for read preprocessing. I would further review your other claims, but for example, "(3) makes indels harder to detect by conventional variant calling." (referring to adapter-trimming) is ridiculous. Trimming adapters properly makes indels easier to detect using conventional variant-callers.

ADD REPLY • link 7.7 years ago by Brian Bushnell 20k

0

Entering edit mode

So you prefer to hard trim FASTQ files from primer sequences instead of soft clipping in BAM?

ADD REPLY • link 7.6 years ago by Paul ★ 1.5k

0

Entering edit mode

Having data (free of any extraneous sequence) makes it simple to do (e.g. a de novo assembly run) different types of analyses.

ADD REPLY • link 7.6 years ago by GenoMax 148k

1

Entering edit mode

We exactly did such FASTQ hard trimming for variant calling purpose but almost missed a germline BRCA1 17-nt deletion in a hereditary breast cancer patient.

ADD REPLY • link 7.6 years ago by Tommy Au ▴ 70

0

Entering edit mode

It's possible that your methodology was flawed. Can you describe what kind of library you were using, the preprocessing, mapping, and variant-calling steps?

ADD REPLY • link 7.6 years ago by Brian Bushnell 20k

1

Entering edit mode

Thank you for your comments. We described the case in details at Scientific Reports 7:1567.

ADD REPLY • link 7.6 years ago by Tommy Au ▴ 70

0

Entering edit mode

Thank you for good article.

ADD REPLY • link 7.3 years ago by Paul ★ 1.5k

0

Entering edit mode

Having read this thread, I still can't choose between using cutadapt and BAMclipper for amplicon sequencing...

ADD REPLY • link 7.3 years ago by lamteva.vera ▴ 220