Trim adapter of undefined length
2
0
Entering edit mode
5.2 years ago
julia.mir • 0

Hi all,

We are interested in a way of trimming adapters of unknown length from short reads.

The adapter is composed of a FIXED part, which will always be at the beginning of the reads, and a long sequence ADAPTER which won't be necessarily present nor complete.


An example would be the following:

Adapter sequence: FIXEDADAPTER

Some reads:

>a    
FIXEDADAPTERmysequence
>b    
FIXEDmysequence
>c    
FIXEDADAPmysequence

All of the reads should be trimmed with only "mysequence" remaining.


We have evaluated the performance of cutadapt, and fastx but none of them seem to include an option that takes this situation into account. Do you have any idea of the best way to approach this?

Any help would be really appreciated.

JĂșlia

adapter trimming short read • 1.3k views
ADD COMMENT
1
Entering edit mode

Cutadapt can be used defining the minimal overlap. Also, your example is more like this, isn't?:

>a    
FIXEDADAPTERmysequence
>b    
APTERmysequence
>c    
EDADAPTERmysequence
ADD REPLY
0
Entering edit mode

what about pandaseq?

ADD REPLY
0
Entering edit mode

Look into bbduk.sh and this option. A guide is available here.

restrictleft=0      If positive, only look for kmer matches in the 
                    leftmost X bases.
ADD REPLY
2
Entering edit mode
5.2 years ago
GenoMax 148k

Using bbduk.sh from BBMap suite.

Ignore the fastq file contents below. I chose a random one at hand.

$ more test.fq
@cluster_8:UMI_CTTTGA
TATCCTTGCAATACTCTCCGAACGGGAGAGC
+
1/04.72,(003,-2-22+00-12./.-.4-
@cluster_12:UMI_GGTCAA
GCAGTTTAAGATCATTTTATTGAAGAGCAAG
+
?7?AEEC@>=1?A?EEEB9ECB?==:B.A?A
@cluster_21:UMI_AGAACA
GGCATTGCAAAATTTATTACACCCCCAGATC
+
>=2.660/?:36AD;0<14703640334-//
@cluster_8:UMI_CTTTGA
CCTTGCAATACTCTCCGAACGGGAGAGCATC
+
1/04.72,(003,-2-22+00-12./.-.4-
@cluster_8:UMI_CTTTGA
TGCAATACTCTCCGAACGGGAGAGCATCTTT
+
1/04.72,(003,-2-22+00-12./.-.4-
@cluster_8:UMI_CTTTGA
TATCGTGCAATACTCTCCGAACGGGAGAGC
+
1/04.72,(003,-2-22+00-12./.-.4

$ more adap.fa (this is the adapter we are searching for)
>test
TATCCTTGCAATACT

$ bbduk.sh in=test.fq ref=adap.fa ktrim=l k=9 out=stdout.fq

java -ea -Xmx1400m -Xms1400m -cp bbmap/current/ jgi.BBDukF in=test.fq ref=adap.fa ktrim=l k=9 out=stdout.fq
Executing jgi.BBDukF [in=test.fq, ref=adap.fa, ktrim=l, k=9, out=stdout.fq]
Version 38.26

0.028 seconds.
Initial:
Memory: max=1468m, total=1468m, free=1438m, used=30m

Added 7 kmers; time:    0.028 seconds.
Memory: max=1468m, total=1468m, free=1433m, used=35m

Input is being processed as unpaired
Started output streams: 0.010 seconds.

@cluster_8:UMI_CTTTGA
CTCCGAACGGGAGAGC
+
-22+00-12./.-.4-
@cluster_12:UMI_GGTCAA
GCAGTTTAAGATCATTTTATTGAAGAGCAAG
+
?7?AEEC@>=1?A?EEEB9ECB?==:B.A?A
@cluster_21:UMI_AGAACA
GGCATTGCAAAATTTATTACACCCCCAGATC
+
>=2.660/?:36AD;0<14703640334-//
@cluster_8:UMI_CTTTGA
CTCCGAACGGGAGAGCATC
+
,-2-22+00-12./.-.4-
@cluster_8:UMI_CTTTGA
CTCCGAACGGGAGAGCATCTTT
+
003,-2-22+00-12./.-.4-
@cluster_8:UMI_CTTTGA
CTCCGAACGGGAGAGC
+
2-22+00-12./.-.4
Processing time:        0.005 seconds.

Input:                      6 reads         185 bases.
KTrimmed:                   4 reads (66.67%)    50 bases (27.03%)
Total Removed:              0 reads (0.00%)     50 bases (27.03%)
Result:                     6 reads (100.00%)   135 bases (72.97%)

Time:                           0.045 seconds.
Reads Processed:           6    0.13k reads/sec
Bases Processed:         185    0.00m bases/sec
ADD COMMENT
0
Entering edit mode

I stand corrected :-)

ADD REPLY
0
Entering edit mode

Thank you very much for your answer. bbduk and ktrim=l option did the trick.

ADD REPLY
0
Entering edit mode
5.2 years ago
Carambakaracho ★ 3.3k

Classic adapter contamination is looks more like what JC describes - your case is not handled out of the box, I believe not even by bbduk, the swiss knife of adapter trimming. However, a pragmatic solution would be an iterative approach (be aware of the pseudo code)

for fq in fq_files_to_trim
    trim fixed <fq >fq_wo_fixedpart
    trim adapter <fq_wo_fixedpart >clean.fq
ADD COMMENT
1
Entering edit mode

It should be doable with bbduk.sh. One can do it with ktrim=l. I will have to test it to confirm.

ADD REPLY
0
Entering edit mode

I'd be thrilled to know, too. To me the variable length adapter between the fixed and sequence part should pose a major challenge to out of the box adapter trimming strategies, including the one of bbduk.

ADD REPLY

Login before adding your answer.

Traffic: 1571 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6