illumina adapter specifying and removing using fastp
1
0
Entering edit mode
3.3 years ago
Mehmet ▴ 820

Dear all,

Recently, I have been asked to do preprocessing of some fastq files produced by Illumina (I don't know which machine produced data).

This is information of a fastq file (forward);

@A00957:111:H5MTHDSX2:3:1101:2718:1063 1:N:0:TCCGCGAA+AGGCTATA CTGACCTCAAGTGATCTACCCACCTCGGTCTCCCAAAGTGCTGGGATTACAGGCAGGAGCCACTGCCCCTGGCCCTAATCATAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCCGCGAAATCTCGTATGCCGGCGTCTGCTTGAAA

when I asked adapter sequences from the company, they provided me them as D710-501 TCCGCGAATATAGCCT (This is for one sample of forward and reverse).

When I checked the header of the fastq file, it can be seen as TCCGCGAA+AGGCTATA

On the other hand, at Illumina's documentation the information is as below:

TruSeq DNA and RNA CD Indexes

Index 1 (i7) Adapters CTAGCGCT GTGTAGAC GATCGGAAGAGCACACGTCTGAACTCCAGTCAC[i7]ATCTCGTATGCCGTCTTCTGCTTG

I want to remove adapters from fastq files. I am a little bit confused about how to specify adapter sequences in an adapter file that will be used as input in fastp or Trimmomatic.

For example,

Is it okay to write as TCCGCGAATATAGCCT in the adapter fasta file or should I specify all? I mean like this (replacing i7 in the illumina documentation with sequences given at the header of the fastq file);

Read1 adapter;

GATCGGAAGAGCACACGTCTGAACTCCAGTCAC[TCCGCGAA]ATCTCGTATGCCGTCTTCTGCTTG

Read2 adapter;

GATCGGAAGAGCACACGTCTGAACTCCAGTCAC[AGGCTATA]ATCTCGTATGCCGTCTTCTGCTTG

adapter index illumina fastp fastq • 2.7k views
ADD COMMENT
2
Entering edit mode
3.3 years ago
GenoMax 147k

There is a core sequence that is common to all illumina adapters. You can specify the core sequence when you are looking for adapters. So when trimming program finds GATCGGAAGAGCACACGTCTGAACTCCAGTCAC (for Read 1 adapter) it can simply remove the remaining sequence all the way to the 3'-end. Same thing for other adapter.

TCCGCGAA+AGGCTATA

Those are index sequences. They are independently sequenced in Illumina technology. Those reads are never a part of actual R1/R2 sequence.

ADD COMMENT

Login before adding your answer.

Traffic: 2406 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6