I am currently going through kmer analysis of some 2x150 PE sequencing that I carried out for a first year PhD project for whole genome sequencing.
I carried out five runs so have 10 paired end files from one individual. As this is the case, do all reads need merging/concatenating before going through jellyfish or should only one of the paired end reads be merged and put through that way?
Alternatively - as jellyfish reads all canonical Kmers would just putting the forward read of one of the runs be enough as it would read it forward and reverse anyway?
Thanks in advance!
You may need to do some QC first before doing kmer counting. I would look for Illumina adapters/multiplexing sequences just in case. Also you want to know if you have some inserts shorter than 300bp so the ends of r1 and r2 may overlap.
QC has already been carried out! There is no adapters in there and reads have been quality trimmed
Are these technical replicates of sequencing a single library or are these five independent libraries?
So there is two libraries, three replicates of one, two replicates of the other.