Hi Everyone,
I downloaded *.sra files of whole genome sequences from SRA database, I extracted FASTQ files and later run FASTQC for raw data QC check.
The "kmer content" section shows overrepresented sequences.
Sequence Count PValue Obs/Exp Max Max Obs/Exp Position
CGCCGTA 79245 0.0 15.361623 46-47
GTCGCCG 102530 0.0 12.1878605 44-45
TCGCCGT 105080 0.0 11.008696 46-47
GCCGTAT 115190 0.0 10.8824835 48-49 ....
I have no idea clearly about what adapter sequences were used, can anyone tell me from where i can find adapter information of SRA downloaded file?
or how can i trim the kmer sequences over representation? I am afriad the kmer length is short they can match with any where in the genome randomly..
Thanks,
sohail