I am using trimmomatic to trim out the adaptor sequences from chip-seq fastq files. I tried Adaptor 1 (TruSeq3-PE-2.fa) file which is default in trimmomatic software and I get 95.77% of both survived reads whereas when I use the adaptor file (A2 see below) with overrepresented sequences I get 80.45% of BothSurviving reads. The dropped pct is ~ 1% in both cases and fastqc for both samples show < .04% of adaptor content. 4% of overrepresented sequences comes out for Sample_001.with adaptorA1.fastq.gz whereas no overrepresented sequences in Sample_001.with adaptorA2.fastq.gz
My question is, when I run the alignment later, is it gonna make it worse by have less % of both surviving reads or when call peaks later I will get less peaks? Which one I should go for?
Filename InputReadPairs BothSurviving ForwardOnlySurviving ReverseOnlySurviving Dropped
Sample_001.with **adaptorA1**.fastq.gz 101005952 96732909 (95.77%) 2347954 (2.32%) 1332466 (1.32%) 592623 (0.59%)
Sample_001.with **adaptorA2**.fastq.gz 101005952 81260458 (80.45%) 1564086 (1.55%) 16804902 (16.64%) 1376506 (1.36%)
Adaptor 1:TruSeq3-PE.fa
>PrefixPE/1
TACACTCTTTCCCTACACGACGCTCTTCCGATCT
>PrefixPE/2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
Adaptor 2:TruSeq3-PE.fa with overrepresented sequence
>PrefixPE/1
TACACTCTTTCCCTACACGACGCTCTTCCGATCT
>PrefixPE/2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
>TruSeqAdapterIndex1
5' GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG
>TruSeqAdapterIndex2
5' GATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTTG
>TruSeqAdapterIndex3
5' GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATCTCGTATGCCGTCTTCTGCTTG
>TruSeqAdapterIndex4
5' GATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATCTCGTATGCCGTCTTCTGCTTG
>TruSeqAdapterIndex5
5' GATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCTTG
>TruSeqAdapterIndex6
5' GATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATGCCGTCTTCTGCTTG
>TruSeqAdapterIndex7
5' GATCGGAAGAGCACACGTCTGAACTCCAGTCACCAGATCATCTCGTATGCCGTCTTCTGCTTG
>TruSeqAdapterIndex8
5' GATCGGAAGAGCACACGTCTGAACTCCAGTCACACTTGAATCTCGTATGCCGTCTTCTGCTTG
>TruSeqAdapterIndex9
5' GATCGGAAGAGCACACGTCTGAACTCCAGTCACGATCAGATCTCGTATGCCGTCTTCTGCTTG
>TruSeqAdapterIndex10
5' GATCGGAAGAGCACACGTCTGAACTCCAGTCACTAGCTTATCTCGTATGCCGTCTTCTGCTTG
>TruSeqAdapterIndex11
5' GATCGGAAGAGCACACGTCTGAACTCCAGTCACGGCTACATCTCGTATGCCGTCTTCTGCTTG
>TruSeqAdapterIndex12
5' GATCGGAAGAGCACACGTCTGAACTCCAGTCACCTTGTAATCTCGTATGCCGTCTTCTGCTTG
If the data that is being trimmed is adapter sequence, it does not belong to your samples and should not be there when you do the analysis anyway.