Hi, I have paired-end reads from a 4C-seq experiment. The files for forward and reverse reads contain reads which start with both the primary and secondary primers.
I have been using cutadapt to trim the forward primer from the forward reads and trim the second primer from the reverse reads. However, I'm not sure if this is the correct approach for the analysis of 4C-seq data because I lose ~ 50% of reads. This is the code that I have been running and the summary. So my question is do I filter out reads that start with the reverse primer sequence 'CTCATTTCCTCCATAGAACATTTTAAAA' from the forward reads 'R1' or do I trim both primers from these reads and keep them?
Command line parameters: -g ACTGATGACCAAATTA -m30 -e0.05 --discard-untrimmed 1087_1_4C-R_11-BfaI_4C-R_11-DpnII_mat_adtr_R1.fastq.gz
=== Summary ===
Total reads processed: 862,850 Reads with adapters: 450,125 (52.2%) Reads written (passing filters): 450,125 (52.2%)
Total basepairs processed: 99,700,299 bp Total written (filtered): 44,825,392 bp (45.0%)
Command line parameters: -g CTCATTTCCTCCATAGAACATTTTAAAA -m30 -e0.05 --discard-untrimmed 1087_1_4C-R_11-BfaI_4C-R_11-DpnII_mat_adtr_R2.fastq.gz
=== Summary ===
Total reads processed: 862,850 Reads with adapters: 448,676 (52.0%) Reads written (passing filters): 448,676 (52.0%)
Total basepairs processed: 107,830,430 bp Total written (filtered): 43,509,520 bp (40.3%)