So I just sequenced my pair-end samples using Nextseq and I want to cut the adaptors off so I was planning to use cutadapt for the job. However I seem to have gotten myself confused with how to phrase the command. The cutadapt manual provides the following command as a default for my data (http://cutadapt.readthedocs.io/en/stable/guide.html#illumina-truseq):
cutadapt \
-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC \
-A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT \
-o trimmed.1.fastq.gz -p trimmed.2.fastq.gz \
reads.1.fastq.gz reads.2.fastq.gz
However since I would also like to remove the indices from each my samples, I was wondering how would I add that to the command (I used NEBNext Adaptors)? Do I just add the index sequence read at the end of the adaptor sequence .
Hopefully the solution is as simple as I assume it is, but thank you in advance :) !
cutadapt is for trimming adapters, and bases on both tails of sequence for fastq. The index sequences is saved in the name-line, line-1, so you do not need to process the index anymore.
but for some specific library, you may have barcode sequence in the beginning of reads (like 10-bp barcode). For this purpose, you can use the --cut option in cutadapt
$ cutadapt --cut 10 ...
Be careful, these barcode is removed before adapter trimming.
-u LENGTH, --cut=LENGTH
Remove bases from each read (first read only if
paired). If LENGTH is positive, remove bases from the
beginning. If LENGTH is negative, remove bases from
the end. Can be used twice if LENGTHs have different
signs. This is applied *before* adapter trimming.
Line 2 of the read is where cutadapt is doing its work. The index sequence is typically the string of ACGT characters at the end of line 1 (AGCGAAC in this case).
As far as I know, cutadapt doesn't perform any operations on that index sequence. If you want to remove that, you'll need to use a different tool or a sed/ awk command.
Let me know if I'm not understanding your question.
According to me, anything after adapter gets trimmed, so no need to add index sequences seperately for filteration. If you find cutadapt confusing, then you can give try to trimmomatic where you can provide file having all possible adapter sequences to trim.
cutadapt
is for trimming adapters, and bases on both tails ofsequence
for fastq. Theindex
sequences is saved in thename-line
, line-1, so you do not need to process theindex
anymore.but for some specific library, you may have
barcode
sequence in the beginning of reads (like 10-bp barcode). For this purpose, you can use the--cut
option incutadapt
Be careful, these
barcode
is removed before adapter trimming.