Trimming Adapters
5
4
Entering edit mode
11.4 years ago
newDNASeqer ▴ 790

I am trying to do variant calling using exome-sequencing data produced by HiSeq 2000. I think I need to first trim the adapters before doing BWA alignment. I have found cutadapt program and think it is good. However, before I use cutadapt to process a large amount of data, I would like to confirm the settings with this community:

I am not exactly sure what adapters are used, but from an Illumina tech document, I found the following sequences common in all their adapter:

adapter 1 (Forward 5'). GATCGGAAGAGCACACGTCTGAACTCCAGTCAC

adapter 2 (Reverse 5'). GTAATAACCGGTT

cutadapt -a adapter1 -a adapter2 -m 25 input.fastq.gz > output_trim.fastq.gz

Is cutadapt generally recommended for trimming adapters? and are my adapter sequences used here too long? thanks

adaptor hiseq • 35k views
ADD COMMENT
10
Entering edit mode
9.1 years ago
Shicheng Guo ★ 9.6k
  1. Check the adaptor (suppose you know it)

    gunzip -c T21.5.read1.fq.gz | grep AGATCGGAAGAG
    
  2. Check the adaptor with fastqc (suppose you do not know it, fastqc can recognize them for you)

    fastqc T21.5.read1.fq.gz
    
  3. trim the adaptors with trim_galore

    trim_galors *.fastq.gz
    
  4. It's ok. Do the alignment with BWA, BOTIWE or any aligners.

By the way:

Adapter 1 (Forward 5'). GATCGGAAGAGCACACGTCTGAACTCCAGTCAC

because there is a process to add 'A' to the end of the fragment. therefore.

AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC

will be right. I think.

TruSeq Universal Adapter

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT

RT is:

AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
ADD COMMENT
0
Entering edit mode
BBduk is also very good at trimming adapters. It is part of BBmap. I have found that it works very well for PE Illumina sequence data, it even has the common adaptets built-in.
ADD REPLY
0
Entering edit mode

Could you please provide the code you are using? I'm trying to do that but it seems that I have to trim both left and right adaptors (ktril=r ktrim=l). In other case in second read adaptors are still present.

ADD REPLY
0
Entering edit mode

Hi, I have questions about adaptor trimming. 1. I run fastqc for my sample and the results showed Illumina Universal Adaptor contamination. Should I trim the Illumina Universal Adaptor or find the adaptor sequences based on authors' library preparation method? 2. The Illumina Universal Adaptor sequence is AGATCGGAAGAG. Why does the tutorial suggest to trim AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC instead?

ADD REPLY
1
Entering edit mode

The adapter sequence is much longer than AGATCGGAAGAG and even longer than the second example. Now adapter recognition works by matching the start of the sequences. So both specifications will work close to the same way.

ADD REPLY
7
Entering edit mode
11.4 years ago

Cutadapt is great, and it's what most people use (with or without TrimGalore). However, not all Illumina adapters are necessarily the same - e.g. for an sRNA-seq experiment, the sequences would be different from those ones. They should work in most cases though. What I ended up doing is running a script on my raw sequence files to make sure that the adapters I'm trimming are actually there, and then using cutadapt to trim them.

I also wrote a blog post about it, in case it's of interest.

ADD COMMENT
2
Entering edit mode

thanks for the link, these are problems that often bite one unexpectedly and very annoying to track them down - seemingly no one knows what has been put on, and they keep punting the question around

ADD REPLY
1
Entering edit mode

Can I try to remove every adapters from illumina (maybe 100 adapters totally) when the adapter is unknow? I mean in theory. Maybe it would be not reality to do it in the practice.

ADD REPLY
3
Entering edit mode
11.4 years ago

cutadapt is fine. I have recently moved to Trim Galore, which is really just a wrapper around cutadapt which simplifies handling of paired-end reads and some other things. By default, Trim Galore looks for a 13-mer from the Illumina standard: AGATCGGAAGAGC, which is found in your adapter 1 sequence (starting from position 2 in that sequence; I am not sure why the G is not included).

ADD COMMENT
0
Entering edit mode

Yup I have also found Trim Galore easy to use and it also takes care of the orphan reads (read pair where one read gets discarded as it can't pass the QC step) in case of paired end data. Aligners like BWA will require your forward and reverse read to follow the same order in the fastq1 and fastq 2 files.

ADD REPLY
0
Entering edit mode

thanks for the reply. So you use the "--paired" option for your trim_galore run ? Do you recommend it for using BWA later? ps: i have paired end reads.

ADD REPLY
0
Entering edit mode

Yes, I use --paired and yes, I recommend it for BWA (although really I just recommend it in general, including for BWA)

ADD REPLY
1
Entering edit mode
11.4 years ago
vijay ★ 1.6k

Cutadapt is fine . You can also try using Fastx or NGSQC toolkits. Fastx allows you to handle with paired end data as well. As rightly pointed out by Jelena, the type and length of adapters depends on the kind of work you are performing. All these tools can effectively help you out in trimming off the adapter sequences.

ADD COMMENT
1
Entering edit mode
11.1 years ago

You could also try:

https://github.com/optimuscoprime/autoadapt

It uses FastQC to detect adaptors and primers, and then cuts them with cutadapt (well, in parallel using several cutadapts)

ADD COMMENT
0
Entering edit mode

this tools needs documentation that describes what the tool actually does, right now is overly generic

ADD REPLY
0
Entering edit mode

you are quite right, I have added some more technical info to the bottom of the readme file. are there any other particular things that you would like to know?

ADD REPLY
1
Entering edit mode

that looks much better,

other observations, I would move the licensing to the end, it is really not that important, and move what the tool does first, this is what people look for, when I go to a tool I want to know what the tool does right away:

We developed a tool to automatically detect which adaptors and primers are present in a FASTQ file and remove those sequences from the file, as well as detecting the quality score encoding type used and removing low quality sequences.

...

now the section on how the tools works

now the installation usage and license

ADD REPLY
0
Entering edit mode

thanks, I've moved the licensing info down a little

ADD REPLY
0
Entering edit mode

I have seen in this tutorial from ARK-Genomics that FastQC might get the adapter contaminants wrong. Have you considered these cases and handled them?

ADD REPLY

Login before adding your answer.

Traffic: 1965 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6