Entering edit mode
6.7 years ago
DVA
▴
630
In my RNA sequencing fastqc report, I consistently notice an abnormally high A (green) peak in session "Per base sequence content". See images below. I worry it is caused by the kmers represented in different regions of the reads.
Anyone has seen this before? I would appreciate it if someone could help me diagnose this problem.Thank you.
Do you have a lot of poly-A stretches in your reads?
Thank you for your reply. I do not expect that, but I can check. Based on the report, "A" seems to show up in the position 10-30 bps. If it is caused by polyA tail wouldn't that be shown at the end?
Have you scanned/trimmed this data to see if you have contaminating sequences present that get trimmed? I suggest using
bbduk.sh
from BBMap suite. Something like:File with adapter sequences (
adapters.fa
) is included inresources
directory in BBMap distro.Not yet. I went to read the protocol of the library preparation and you might be right about it has polyA. Should I trim the whole first 30bps? Thank you so much for all the information.
Try trimming the data as I suggested above first. If you still have poly-A stretches left over afterwards then they can be trimmed with another run of
bbduk.sh
. It is the first 30+ bases that may be the good sequence here so you want to keep those for sure.This GC plot looks almost bimodal - are you expecting this (i.e. is this a mixed sample, e.g. plant-pathogen/other metagenomics)? Otherwise, maybe try some read classification (centrifuge, kraken with a transcriptome database, k-Slam, ...) and see what you've got in there?
No I do not expect this. It is human sample. I will look into the databases. Thank you.
This in important information and should have been added to original post. Was this a particular kind of kit/technology? You should follow instructions that may be specific for post-processing data in that case.
Thank you for the reply. I am looking into it.
I'm so sorry. It is not single cell. Nonetheless I am going to try your method. Thank you.