Dear all,
Recently, I have been asked to do preprocessing of some fastq files produced by Illumina (I don't know which machine produced data).
This is information of a fastq file (forward);
@A00957:111:H5MTHDSX2:3:1101:2718:1063 1:N:0:TCCGCGAA+AGGCTATA CTGACCTCAAGTGATCTACCCACCTCGGTCTCCCAAAGTGCTGGGATTACAGGCAGGAGCCACTGCCCCTGGCCCTAATCATAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCCGCGAAATCTCGTATGCCGGCGTCTGCTTGAAA
when I asked adapter sequences from the company, they provided me them as D710-501 TCCGCGAATATAGCCT (This is for one sample of forward and reverse).
When I checked the header of the fastq file, it can be seen as TCCGCGAA+AGGCTATA
On the other hand, at Illumina's documentation the information is as below:
TruSeq DNA and RNA CD Indexes
Index 1 (i7) Adapters CTAGCGCT GTGTAGAC GATCGGAAGAGCACACGTCTGAACTCCAGTCAC[i7]ATCTCGTATGCCGTCTTCTGCTTG
I want to remove adapters from fastq files. I am a little bit confused about how to specify adapter sequences in an adapter file that will be used as input in fastp or Trimmomatic.
For example,
Is it okay to write as TCCGCGAATATAGCCT in the adapter fasta file or should I specify all? I mean like this (replacing i7 in the illumina documentation with sequences given at the header of the fastq file);
Read1 adapter;
GATCGGAAGAGCACACGTCTGAACTCCAGTCAC[TCCGCGAA]ATCTCGTATGCCGTCTTCTGCTTG
Read2 adapter;
GATCGGAAGAGCACACGTCTGAACTCCAGTCAC[AGGCTATA]ATCTCGTATGCCGTCTTCTGCTTG