I have Illumina MiSeq PE 2X250 data
I got the following info from the sequencing (some irrelevant or private data omitted):
[Header]
IEMFileVersion,4
Investigator Name,xxxx
Experiment Name,xxxxxx
Date,x/x/xxxx
Workflow,Assembly
Application,Assembly
Assay,TruSeq HT
Description,
Chemistry,Amplicon
[Reads]
301
301
[Settings]
ReverseComplement,0
kmer,31
Adapter,AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
AdapterRead2,AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
[Data]
Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,I5_Index_ID,index2,GenomeFolder,Sample_Project,Description
1,AB1,,,D701,ATTACTCG,D501,TATAGCCT,,,
2,AB2,,,D701,ATTACTCG,D502,ATAGAGGC,,,
3,AB3,,,D701,ATTACTCG,D503,CCTATCCT,,,
4,AB4,,,D701,ATTACTCG,D504,GGCTCTGA,,,
5,AB5,,,D701,ATTACTCG,D505,AGGCGAAG,,,
6,AB6,,,D701,ATTACTCG,D506,TAATCTTA,,,
7,AB7,,,D701,ATTACTCG,D507,CAGGACGT,,,
I want to trim with timmomatic and then assemble with Spades (and maybe other assemblers). I do not have the data yet. So I do not know if the adapters and indexes are trimmed away. But let's assume they are still there.
Now some questions:
- Is it necessary to add the indexes to the adapter file of trimmomatic (so that they get removed as well)? Or do the short sequences in general not interfere with assembly? Or could it even happen that trimmomatic finds "false positives", because those sequences are so short?
- Even if the adapters and indexes would have been trimmed away would it still be necessary to provide the sequences because of read-throughs? (For example R1 reading through adaptor of R2)
- How do I have to configure the adapter file of trimmomatic? For which sequences do I have to provide the reverse-complement? And what about the /1 and /2 option of trimmomatic? I did not really understand.
Hi Brain, this is an old post, I'm not sure if you can still see this, but I'm going to try my luck! I wanted to know what would happen if I omit
k=23 mink=11
parameters in PE adapter trimming process, so I tried both:with both parameters
without the parameters
It seems like with the parameters more bases get trimmed off. Let's say at this point I only want to remove adapter sequence, is it safe to leave those parameters out?
Thank you!
Yang
Hi Yang,
If you don't specify "k=23 mink=11" it will use the default of "k=27" and no mink. "k=23 mink=11" is more sensitive, so I recommend that. Since you have PE reads, most of the time, even when adapters are not detected from kmers, they will still be detected based on overlap, which is why the overall amount of data removed (8.66% of bases compared to 8.63%) only changed very slightly.
But I do recommend keeping "k=23 mink=11" because the reads that don't get trimmed when you leave those parameters off almost certainly do have adapters, that should get trimmed. In other words, the 0.03% additional bases you gain by removing those flags are virtually all adapter sequence that you don't want.
Hi Brain,
Thank you for the explanation, very helpful.
There is something else I'm confused about. The workflow I tried was:
bbmerge-auto.sh in1=read1.fastq in2=read2.fastq out=merged.fastq outu=unmerged.fastq ihist=ihist.txt ecct extend2=20 iteration=5
to merge non-overlapping PE reads (read is 151 bp, fragment size is larger than 400),bbmerge.sh in=merged.fastq outa=adapters.fa
to discover adapter sequences, at the end usebbduk.sh
to filter low quality reads. What I found was afterbbmerge-auto.sh
, I couldn't detect any adapter sequences in the merged file (instead, they were all in unmerged file).bbduk
, nothing got trimmed/filtered if I set Q=10, should I increase Q to 20? What is the difference between quality trimming and quality filtering?Look forward hearing back from you!
Yang