Primers trimming from Illumina paired-end reads by BBDuk software
1
1
Entering edit mode
3.3 years ago
Denis ▴ 310

Hi,

I'm dealing with Illumina amplicon data. I'd like to apply BBDuk to remove primers from the 5'end of my forward and reverse reads, but have a couple of questions. Could you help me please?

  1. How may i indicate in command line which primer i need to clip from forward and which from the reverse reads?

  2. There is the restrictleft BBDuk parameter to restrict search within the leftmost bases of the read instead of the whole read sequence. Is it possible to set up this parameter separately for forward and reverse reads?

Illumina adapter BBDuk trimming amplicons PCR • 3.6k views
ADD COMMENT
1
Entering edit mode

Is your protocol directional i.e. certain sequence would always show up at 5'-end then you could set up your trimming to be specific for forward/reverse reads. You could then use restrictright= if rev-comp sequence will be showing up in other file.

BBDuk is automatically going to look for reverse complement matches (you can provide the sequence using literal=), if your sequencing is not directional.

You may want to do this as a two-pass operation.

bbduk.sh -Xmx4g in1=R1.fq.gz in2=R2.fq.gz out=stdout.fq restrictleft=NN ktrim=l literal=seq1,seq2 | bbduk.sh -Xmx4g in=stdin.fq out1=R1_trim.fq.gz out2=R2_trim.fq.gz ktrim=r restrictright=NN literal=seq1,seq2 

If you can post a couple of example sequences I can help with right command line.

ADD REPLY
0
Entering edit mode

Thank you very much! Yes it is directional. Based on the PCR properties i will always find primers at the 5'-end of the forward and reverse reads (please look at the examples below). So i have to remove just one certain primer sequence from R1 and one another certain sequence from R2. In your example i have to clip seq1 in plus orientation from R1 and seq2 in plus orientation from the R2. In addition to my understanding it should be restrictleft both for the R1 and R2, but the value for this parameter has to be different for the R1 and R2 because of their different length. 

R1

ATCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCTTATATTATAGTTGTTGCAGTTAAAAAGCTCGTAGTTGGATTTCTGGGAGGGTGCCGCCGTCCGGCGTGTCCGTGTGCAGTGGCGCCCTTCCATCCTTCTGTTACCGTCTCTTGGCATTCATTTGCTGGTTGCGGGCTCAGATATTTTACCTTGCGAACATTATAGTGTTTCAGGCAGCCTAGGCCGGAATACATTAGCATGGAATAATGGACTAGGACTACGGTCTCTTTGTTGGTTTGCTGGACTGCAGTAATGATTACTAGGGT

ATCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTAAAGTTGTTGCAGTTAAAAAGCTCGTAGTTGGATTTCTGGGAGGGTGGCGATGTCCGCTTAACGTGTGTGCAGCGGCGCCCTTCCATCCTTCTGTTAGCGTCTCTTGGCATTCATTTGCTGGTGGCGGGCTCAGATATTTTACCTTGAGAAAATTAGAGTGTTTCAGGCAGGCTAGGCCGGAATACATTAGCATGGAATAATGGAATAGGACTACGGTCTCTTTGTTGGTTTGAGGGACTGCAGTAATGATTAATAGGGG

ATCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTAAAGTTGTTGCAGTTAAAAAGCTCGTAGTTGGATTTCTGGGAGGGTGCCGCCGTCCGGCGTGTCCGTGTGCAGTGGCGCCCTTCCATCCTTCTGTTAGCGTCTCTTGGCATTCATTTGCTGGTGGCGGGCTCAGATATTTTACCTTGAGAAAATTAGAGTGTTTCAGGCAGGCTAGGCCGGAATACATTAGCATGGAATAATGGAATAGGACTACGGTCTCTTTGTTGGTTTGAGGGACTGCAGTAATGATTAATAGGGA

R2

ATATCCCCTAACTTTCGTTCTTGATTAATGAAAACATCCTTGGTAAATGCTTTCGCCTAAGTTCATCTTTCATAAATCCAAGAATTTCACCTCTGACAATTAAATATGAATACCCCCAACTATCCCTATTAATCATTACCTCGATCCGCAAACCAACAAAATAATGACCCAAGTCTTATCTTATTATTCCATGCTAATGTATTCAACGGCCTAAGCCTGCTTTAAACACTCTAATTTTTTCACAGTAAACGATGTGTTCCCAAACCCCATCCAACTAAGGACCGAGCATTTCCCACAAGGA ATATCCCCTAACTTTCGTTCTTGATGTGCGGAATTACCGCGGCTGCTGGATCTGTCTCTTATACACATCTGACGCTGCCGACGAATAGAGAGGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAAAACCATTCACTATAAATTCGTTTCCCTAATTTCATCTCCACTCCACCTACCATCACCTTCCTTCTCCCTCACTCCACCTCACTCATTCCCTCCTCTCGTCTCATGCCCTTTTCCCTTCTCTCCACTTTACTCTTCTACCTATTCTCCACATCCATATCCACACTATCGCC ATATCCCCTAACTTTCGTTCTTGATTAATGAAAACATCCTTGGCAAATGCTTTCGCATAAGTTAGTCTTTAACAAATCTGAGAATTCCACCTCTGACAATTAAATACTAATGCCCCCAACTATCCCTATTAATCATTACTGCAGTCCCTCAAACCAACAACGAGACCGTAGTCCTATTCCATTATTCCATGCTAATGTATTCCGGCCTAGCCTGCCTGAAACACTCTAATTTTCTCAAGGTAAATTATCTGAGCACACCACCGGCAAGTAAATGCCGAGAGGCGTTAACAGAAGGATGCGG

ADD REPLY
1
Entering edit mode

If you don't expect anything to show up on the 3'-end then you can process the two reads independently as you describe above. If you are only trimming the highlighted sequence then there should be no reason that you will lose an entire read. But just in case that were to happen you will want to re-sync R1/R2 files with repair.sh after trimming.

ADD REPLY
0
Entering edit mode

Thank you for your reply! I'm wondering if i'll merge my R1 and R2 in one DNA fragment by bbmerge before primers clipping. May i in that case remove my primers from the 5' and 3' ends of the resulting DNA fragment just in one step skipping repair.sh executing? Something like this:

bbduk.sh in=reads.fq out=clean.fq  ktrim=r ktrim=l k=23 mink=11 restrictleft=20 restrictright=18

But how can i in that case specify the sequence for right-trimming and corresponding sequence for left-trimming?

ADD REPLY
2
Entering edit mode
3.3 years ago
GenoMax 148k

Following should work. Give it a try and let me know.

bbduk.sh -Xmx2g in1=test_R1.fq in2=test_R2.fq out1=R1_trim.fq out2=R2_trim.fq literal=ATCCAGCAGCCGCGGTAATT,ATATCCCCTAACTTTCGT ktrim=l restrictleft=20 k=5

Since your reads are long I am going to post the example by converting them to fasta format. Fastq format should work as indicated above.

$ head -4 test_*.fa
==> test_R1.fa <==
>R1_1
ATCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCTTATATTATAGTTGTTGCAGTTAAAAAGCTCGTAGTTGGATTTCTGGGAGGGTGCCGCCGTCCGGCGTGTCCGTGTGCAGTGGCGCCCTTCCATCCTTCTGTTACCGTCTCTTGGCATTCATTTGCTGGTTGCGGGCTCAGATATTTTACCTTGCGAACATTATAGTGTTTCAGGCAGCCTAGGCCGGAATACATTAGCATGGAATAATGGACTAGGACTACGGTCTCTTTGTTGGTTTGCTGGACTGCAGTAATGATTACTAGGGT
>R2_1
ATCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTAAAGTTGTTGCAGTTAAAAAGCTCGTAGTTGGATTTCTGGGAGGGTGGCGATGTCCGCTTAACGTGTGTGCAGCGGCGCCCTTCCATCCTTCTGTTAGCGTCTCTTGGCATTCATTTGCTGGTGGCGGGCTCAGATATTTTACCTTGAGAAAATTAGAGTGTTTCAGGCAGGCTAGGCCGGAATACATTAGCATGGAATAATGGAATAGGACTACGGTCTCTTTGTTGGTTTGAGGGACTGCAGTAATGATTAATAGGGG

==> test_R2.fa <==
>R1_2
ATATCCCCTAACTTTCGTTCTTGATTAATGAAAACATCCTTGGTAAATGCTTTCGCCTAAGTTCATCTTTCATAAATCCAAGAATTTCACCTCTGACAATTAAATATGAATACCCCCAACTATCCCTATTAATCATTACCTCGATCCGCAAACCAACAAAATAATGACCCAAGTCTTATCTTATTATTCCATGCTAATGTATTCAACGGCCTAAGCCTGCTTTAAACACTCTAATTTTTTCACAGTAAACGATGTGTTCCCAAACCCCATCCAACTAAGGACCGAGCATTTCCCACAAGGA
>R2_2
ATATCCCCTAACTTTCGTTCTTGATGTGCGGAATTACCGCGGCTGCTGGATCTGTCTCTTATACACATCTGACGCTGCCGACGAATAGAGAGGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAAAACCATTCACTATAAATTCGTTTCCCTAATTTCATCTCCACTCCACCTACCATCACCTTCCTTCTCCCTCACTCCACCTCACTCATTCCCTCCTCTCGTCTCATGCCCTTTTCCCTTCTCTCCACTTTACTCTTCTACCTATTCTCCACATCCATATCCACACTATCGCC

# We now do actual trimming. I am sending output to STDOUT for this example.
$ bbduk.sh -Xmx2g in1=test_R1.fa in2=test_R2.fa out=stdout.fa literal=ATCCAGCAGCCGCGGTAATT,ATATCCCCTAACTTTCGT ktrim=l restrictleft=20 k=5
# output you get
Input is being processed as paired
Started output streams: 0.013 seconds.
>R1_1
CCAGCTCCAATAGCTTATATTATAGTTGTTGCAGTTAAAAAGCTCGTAGTTGGATTTCTGGGAGGGTGCC
GCCGTCCGGCGTGTCCGTGTGCAGTGGCGCCCTTCCATCCTTCTGTTACCGTCTCTTGGCATTCATTTGC
TGGTTGCGGGCTCAGATATTTTACCTTGCGAACATTATAGTGTTTCAGGCAGCCTAGGCCGGAATACATT
AGCATGGAATAATGGACTAGGACTACGGTCTCTTTGTTGGTTTGCTGGACTGCAGTAATGATTACTAGGG
T
>R1_2
TCTTGATTAATGAAAACATCCTTGGTAAATGCTTTCGCCTAAGTTCATCTTTCATAAATCCAAGAATTTC
ACCTCTGACAATTAAATATGAATACCCCCAACTATCCCTATTAATCATTACCTCGATCCGCAAACCAACA
AAATAATGACCCAAGTCTTATCTTATTATTCCATGCTAATGTATTCAACGGCCTAAGCCTGCTTTAAACA
CTCTAATTTTTTCACAGTAAACGATGTGTTCCCAAACCCCATCCAACTAAGGACCGAGCATTTCCCACAA
GGA
>R2_1
CCAGCTCCAATAGCGTATATTAAAGTTGTTGCAGTTAAAAAGCTCGTAGTTGGATTTCTGGGAGGGTGGC
GATGTCCGCTTAACGTGTGTGCAGCGGCGCCCTTCCATCCTTCTGTTAGCGTCTCTTGGCATTCATTTGC
TGGTGGCGGGCTCAGATATTTTACCTTGAGAAAATTAGAGTGTTTCAGGCAGGCTAGGCCGGAATACATT
AGCATGGAATAATGGAATAGGACTACGGTCTCTTTGTTGGTTTGAGGGACTGCAGTAATGATTAATAGGG
G
>R2_2
TCTTGATGTGCGGAATTACCGCGGCTGCTGGATCTGTCTCTTATACACATCTGACGCTGCCGACGAATAG
AGAGGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAAAACCATTCACTATAAATTCGTTTCCCT
AATTTCATCTCCACTCCACCTACCATCACCTTCCTTCTCCCTCACTCCACCTCACTCATTCCCTCCTCTC
GTCTCATGCCCTTTTCCCTTCTCTCCACTTTACTCTTCTACCTATTCTCCACATCCATATCCACACTATC
GCC
ADD COMMENT
0
Entering edit mode

Thank you so much! Your command line works perfectly on my data! May i asked a few related questions about adapter trimming for future applications?

  1. Is k=5 an optimal settings to trim out primers (bold sequences above with length from 18 to 27)? In the documentation k=21 was indicated though.

  2. There is no need to use mink parameter along with the short k (e.g. k=5)?

  3. It's clear from the documentation that i have to use copyundefined option to work with degenerate sequences, but should i always use mm=f in that cases?

  4. As i understand from the documentation bbmerge will authomatically remove all the adapters at 3' end during paired reads merging. So i may not worry about the 3' end adapters trimming in that case. Am i right?

ADD REPLY
1
Entering edit mode

Since your primer sequence was small (ATCCAGCAGCCGCGGTAATT) I ended up using k=5. Generally you will want to have that value be < 1/2 length of the thing you are trying to trim.

One thing BBMap has is an abundance of options. Some options benefited @Brian's workflow so may not be of general use for rest of us. I have rarely used degenerate bases option so my general suggestion is start with minimum number of options that you think are needed. If that does not prove sufficient then start looking at adding others.

If your reads are designed to merge then it may be best to merge them before trimming the longer representation you will get.

ADD REPLY

Login before adding your answer.

Traffic: 2059 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6