How to remove kmer profiles?
1
1
Entering edit mode
9.2 years ago
kirannbishwa01 ★ 1.6k

I did a check of my fastq files using fastqc which reavealed several problems: 1) per base gc content, per base sequence content) at the intial part of the 100 bp paired end 2) several over represented sequences and kmer profiles. I then used trimmomatic to remove first 10 base pairs (headcrop 10) which showed some problems in the reads (is it so????) and also supplied Illumina adapters to remove the over represented sequences and kmer profiles using Illuminaclip. The report for overrepresented sequences has been good but the kmer profiles are still existing.

How should I remove those kmer profiles? Is it fine to go ahead and do the alignment to the reference genome without correcting for the kmers?

Thank yop in advance !

I wanted to share the pics/html files, I have got but I am not finding any options to share it on this forum. I am not sure why is that ? Are attachments not allowed on Biostars forum?

- Bishwa K.

kmers fastqc quality filtering attachments • 4.0k views
ADD COMMENT
0
Entering edit mode

Please upload things somewhere and link to them. Also, what kind of experiment was this (e.g., RNAseq)?

ADD REPLY
0
Entering edit mode

Hi Devon,

I have shared the link using google drive sharing. I think it will work after you download the link (on the browser). The data are genomic reseq data.

Thanks

ADD REPLY
0
Entering edit mode

I am attaching the link to the files that are available in html format. I think it will open on the browser after downloading.

This if the fasqc report for raw files (genomic resequenced data, paired end reads). It shows several problems: 1) per base gc content, per base sequence content) at the intial part of the 100 bp paired end 2) several over represented sequences and kmer profiles.

I then head cropped (10 bases) and removed adapter using trimmomatic

adapters: https://drive.google.com/file/d/0B9YUBnYGAr1AS0hrc2lMbE43ZUU/view?usp=sharing

https://drive.google.com/file/d/0B9YUBnYGAr1ANEFZc3FleDRob3M/view?usp=sharing

only adapter trimming improved the kmer profiles but not most of the sequence content and gc content per base at the first 10 bp of the read.

The new fastqc 0.32 reports kmer profiles for the fasta files that were not reported by fastqc (available on iplant).

Also, the RNAseq data has following fastqc report; no kmer and adapter contaminant but the gc and base content show more variation at the first 10 bp.

I am thinking of proceeding with adapter trimming but no head crop, but I would like to know why is there such variation at the first 10 base pairs of reads (for both RNAseq and genomic reseq data; they were both sequenced at different facilities).

Thanks

ADD REPLY
0
Entering edit mode

Can someone comment on my report?

Thanks,

ADD REPLY
5
Entering edit mode
9.2 years ago

Don't worry about the kmers - in the vast majority of cases they provide useless information.

I would also not head crop data, that rarely helps. Many aligners (like bwa) will tolerate leading and trailing errors in reads.

ADD COMMENT
0
Entering edit mode

As Istvan said. There doesn't seem to be anything terribly wrong with your data. In my experience the kmers are often low in number relative to total reads, and are often caused by adapters. The nucleotide usage at the beginning of the reads always looks odd (not flat) due to random hexamers used by Illumina not being truly random. The important thing is to remove adapters and use the sliding window for quality trimming.

ADD REPLY
0
Entering edit mode

Thanks for the update!

ADD REPLY

Login before adding your answer.

Traffic: 1638 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6