Removing Indices using Cutadapt
2
0
Entering edit mode
6.6 years ago
a.mailli • 0

Hello everyone,

So I just sequenced my pair-end samples using Nextseq and I want to cut the adaptors off so I was planning to use cutadapt for the job. However I seem to have gotten myself confused with how to phrase the command. The cutadapt manual provides the following command as a default for my data (http://cutadapt.readthedocs.io/en/stable/guide.html#illumina-truseq):

cutadapt \ -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC \ -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT \ -o trimmed.1.fastq.gz -p trimmed.2.fastq.gz \ reads.1.fastq.gz reads.2.fastq.gz

However since I would also like to remove the indices from each my samples, I was wondering how would I add that to the command (I used NEBNext Adaptors)? Do I just add the index sequence read at the end of the adaptor sequence .

Hopefully the solution is as simple as I assume it is, but thank you in advance :) !

Kind regards,

A

cutadapt adaptor trimming • 3.7k views
ADD COMMENT
0
Entering edit mode

cutadapt is for trimming adapters, and bases on both tails of sequence for fastq. The index sequences is saved in the name-line, line-1, so you do not need to process the index anymore.

but for some specific library, you may have barcode sequence in the beginning of reads (like 10-bp barcode). For this purpose, you can use the --cut option in cutadapt

$ cutadapt --cut 10  ...

Be careful, these barcode is removed before adapter trimming.

-u LENGTH, --cut=LENGTH
                    Remove bases from each read (first read only if
                    paired). If LENGTH is positive, remove bases from the
                    beginning. If LENGTH is negative, remove bases from
                    the end. Can be used twice if LENGTHs have different
                    signs. This is applied *before* adapter trimming.
ADD REPLY
1
Entering edit mode
6.6 years ago
Dan D 7.4k

OK, so let's look at a typical FASTQ read:

@E00777:238:HABCDCCXY:1:1101:7770:2294 1:N:0:AGCGAAC
ACAGTTGTCCAGTGGCAACAAGGACTCAAGAGATAGAAGACTGATATTATGGTATTTTGAACACCAGCTGAAACCCTTAGTGGCCGAATTTGTGCAGGTCT
+
-AAFFJJJJJF7AFJJAJJJJJAFJJJJ--FF-FF-<FJJJ<F<J<JFJJJ7FFJJJJF<AFJJJJJFF<-JJ<-AAAJFJFJ--77AFFJJFFJJ-7FJA

Line 2 of the read is where cutadapt is doing its work. The index sequence is typically the string of ACGT characters at the end of line 1 (AGCGAAC in this case).

As far as I know, cutadapt doesn't perform any operations on that index sequence. If you want to remove that, you'll need to use a different tool or a sed/ awk command.

Let me know if I'm not understanding your question.

ADD COMMENT
0
Entering edit mode

Yes that made sense, I think it was a misunderstanding from my part, thank you for your clarification.

ADD REPLY
0
Entering edit mode
6.6 years ago
Tm ★ 1.1k

According to me, anything after adapter gets trimmed, so no need to add index sequences seperately for filteration. If you find cutadapt confusing, then you can give try to trimmomatic where you can provide file having all possible adapter sequences to trim.

ADD COMMENT

Login before adding your answer.

Traffic: 1686 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6