After running trimmomatic to trim the raw reads, the fastqc still reported kmer-content failure. What should I do with this issue? Thank you so much!
After running trimmomatic to trim the raw reads, the fastqc still reported kmer-content failure. What should I do with this issue? Thank you so much!
I would not be too much worried about the k-mer content unless there are other warning flags in QC. Moreover, the 5'-end has some bias for priming https://sequencing.qcfail.com/articles/positional-sequence-bias-in-random-primed-libraries/
What is your library (RNA-seq)?
Thank you so much! I think this is exactly what the problem is. I was actually trying out with a downloaded SRA data. Although it was a genome sequencing but I think it used random priming and caused uneven base content at the start of the sequence. So this is actually a problem that can not be fixed by trimming.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I looked up similar issues and some have suggested that we need not worry too much about kmer-content plot. But in this data, it seems that the kmer enrichment is the most serious at the 5' end of the read. Could it be caused by adapter contamination but was not detected by fastqc because the adapter was not in the adapter list?
Map and look at mismatches per position.
bbmap.sh
can do just that with a single command:If your enriched kmers are artifacts, there will be an increase of mismatches and / or indels at the first positions.
An error occurred while running bbmap. Should I change the maximum heap size?
You set your heap to -362m, rather than 362m. The flag should have been "-Xmx362m". But, it looks like the absolute minimum amount of memory BBMap needs, for a bacterial-size genome with default parameters, is around "-Xmx800m". For a smaller kmer length of 12 you can get by with "-Xmx362m".
I aligned the reads towards an e. coli genome downloaded from NCBI using bowtie2, --very-sensitive-local, but only got 66.91% overall alignment rate. Is this alingment rate normal?
That depends on the situation. Aligning an isolate to an isolate assembly, 66% is terrible. Aligning metagenomic reads to their assembly... it's often good.
I was using a SRA dataset which is an isolate. So what do you think might be the problem? I am not very familiar with bacteria. Is it possible that E. coli DNA sequence is very different between strains and caused this low alignment rate.
Could be adapter sequence, could be contaminants, could be low quality. Hard to say. You might try BLASTing a handful of the unmapped reads to see what they hit. E.coli's tend to be very similar so it's unlikely that divergence is the problem.
I blasted the unmapped reads and it turned out a lot of them are 100% similar to ecoli genes, so I am a little confused why it did not map to the ecoli genome at the first place.
Did you check the other read pair why it not mapping? May be this is adaptor contamination, may be low quality. And one of your problems is the
--very-sensitive-local
mode, which will bias bowtie2 to look for local instead of end-to-end match. Try running with default params.I don't think you need
fastareadlen=500
if your data is fastq. The amount of memory you need depends on the size of the reference genome, but yes, 362Mb seems very small.Ok. Thanks. I will try it on a computer with larger memory.