Question

Removing Indices using Cutadapt

0

Entering edit mode

7.0 years ago

a.mailli • 0

Hello everyone,

So I just sequenced my pair-end samples using Nextseq and I want to cut the adaptors off so I was planning to use cutadapt for the job. However I seem to have gotten myself confused with how to phrase the command. The cutadapt manual provides the following command as a default for my data (http://cutadapt.readthedocs.io/en/stable/guide.html#illumina-truseq):

cutadapt \ -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC \ -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT \ -o trimmed.1.fastq.gz -p trimmed.2.fastq.gz \ reads.1.fastq.gz reads.2.fastq.gz

However since I would also like to remove the indices from each my samples, I was wondering how would I add that to the command (I used NEBNext Adaptors)? Do I just add the index sequence read at the end of the adaptor sequence .

Hopefully the solution is as simple as I assume it is, but thank you in advance :) !

Kind regards,

A

cutadapt adaptor trimming • 4.1k views

ADD COMMENT • link updated 7.0 years ago by Tm ★ 1.1k • written 7.0 years ago by a.mailli • 0

0

Entering edit mode

cutadapt is for trimming adapters, and bases on both tails of sequence for fastq. The index sequences is saved in the name-line, line-1, so you do not need to process the index anymore.

but for some specific library, you may have barcode sequence in the beginning of reads (like 10-bp barcode). For this purpose, you can use the --cut option in cutadapt

$ cutadapt --cut 10  ...

Be careful, these barcode is removed before adapter trimming.

-u LENGTH, --cut=LENGTH
                    Remove bases from each read (first read only if
                    paired). If LENGTH is positive, remove bases from the
                    beginning. If LENGTH is negative, remove bases from
                    the end. Can be used twice if LENGTHs have different
                    signs. This is applied *before* adapter trimming.

ADD REPLY • link 5.1 years ago by wm ▴ 570

score 1 · Answer 1 · 2018-05-04

OK, so let's look at a typical FASTQ read:

@E00777:238:HABCDCCXY:1:1101:7770:2294 1:N:0:AGCGAAC
ACAGTTGTCCAGTGGCAACAAGGACTCAAGAGATAGAAGACTGATATTATGGTATTTTGAACACCAGCTGAAACCCTTAGTGGCCGAATTTGTGCAGGTCT
+
-AAFFJJJJJF7AFJJAJJJJJAFJJJJ--FF-FF-<FJJJ<F<J<JFJJJ7FFJJJJF<AFJJJJJFF<-JJ<-AAAJFJFJ--77AFFJJFFJJ-7FJA

Line 2 of the read is where cutadapt is doing its work. The index sequence is typically the string of ACGT characters at the end of line 1 (AGCGAAC in this case).

As far as I know, cutadapt doesn't perform any operations on that index sequence. If you want to remove that, you'll need to use a different tool or a sed/ awk command.

Let me know if I'm not understanding your question.

score 0 · Answer 2 · 2018-05-05

0

Entering edit mode

7.0 years ago

Tm ★ 1.1k

According to me, anything after adapter gets trimmed, so no need to add index sequences seperately for filteration. If you find cutadapt confusing, then you can give try to trimmomatic where you can provide file having all possible adapter sequences to trim.

ADD COMMENT • link 7.0 years ago by Tm ★ 1.1k