Hello, biostars!
I had one small issue, which grew up in a big question.
My issue is following: i'm going to count k-mers and it's frequency (may be depth) on mouse wgs by Illumina (in jellyfish software). Is it correct to combine 2 fastq files into one just by bash 'cut' command? And is it wise to state '-C' flag in jellyfish? Or i need to filter reads by quality before counting k-mers?
If i will do the same on SE reads, will i get similar or identical results? I understand, that for the genome it does not matter PE or SE we use. Correct me if i'm wrong, but as i think, between SE, PE and MP reads there will be difference in unique k-mers, because of ngs technology procedures? Thus, for k-mer counting the comparison of uniqueness value from SE and PE technologies is not correct.
I'm interested in this topic because I've been writing perl script for kmer counting recently. Can you please explain why you are wanting to count kmer frequency in NGS reads? I can share my script for easy kmer counting if you'd like, and if it would be helpful for your situation.
Firstly why do you want to mix two files ? Keep them single and check the k-mers individually and compare it after combining if needed. I think PE or SE doesn't matter much why can't you use more handy FASTQC tool for the K-mer counting i think it would be better if you use that
Good idea, may be i will do that, thank you!