High A in "Per base sequence content" of fastQC report

2

Entering edit mode

6.7 years ago

DVA ▴ 630

In my RNA sequencing fastqc report, I consistently notice an abnormally high A (green) peak in session "Per base sequence content". See images below. I worry it is caused by the kmers represented in different regions of the reads.

Anyone has seen this before? I would appreciate it if someone could help me diagnose this problem.Thank you.

enter image description here

fastqc • 4.6k views

ADD COMMENT • link 6.7 years ago by DVA ▴ 630

0

Entering edit mode

Do you have a lot of poly-A stretches in your reads?

ADD REPLY • link 6.7 years ago by GenoMax 147k

0

Entering edit mode

Thank you for your reply. I do not expect that, but I can check. Based on the report, "A" seems to show up in the position 10-30 bps. If it is caused by polyA tail wouldn't that be shown at the end?

ADD REPLY • link 6.7 years ago by DVA ▴ 630

2

Entering edit mode

Have you scanned/trimmed this data to see if you have contaminating sequences present that get trimmed? I suggest using bbduk.sh from BBMap suite. Something like:

bbduk.sh in1=reads_R1.fq in2=reads_R2.fq out1=clean_R1.fq out2=clean_R2.fq ref=adapters.fa ktrim=r k=23 mink=11 tbe tpo

File with adapter sequences (adapters.fa) is included in resources directory in BBMap distro.

ADD REPLY • link 6.7 years ago by GenoMax 147k

0

Entering edit mode

Not yet. I went to read the protocol of the library preparation and you might be right about it has polyA. Should I trim the whole first 30bps? Thank you so much for all the information.

ADD REPLY • link 6.7 years ago by DVA ▴ 630

1

Entering edit mode

Try trimming the data as I suggested above first. If you still have poly-A stretches left over afterwards then they can be trimmed with another run of bbduk.sh. It is the first 30+ bases that may be the good sequence here so you want to keep those for sure.

ADD REPLY • link 6.7 years ago by GenoMax 147k

0

Entering edit mode

This GC plot looks almost bimodal - are you expecting this (i.e. is this a mixed sample, e.g. plant-pathogen/other metagenomics)? Otherwise, maybe try some read classification (centrifuge, kraken with a transcriptome database, k-Slam, ...) and see what you've got in there?

ADD REPLY • link 6.7 years ago by cschu181 ★ 2.8k

0

Entering edit mode

No I do not expect this. It is human sample. I will look into the databases. Thank you.

ADD REPLY • link 6.7 years ago by DVA ▴ 630

0

Entering edit mode

single cell RNA Seq

This in important information and should have been added to original post. Was this a particular kind of kit/technology? You should follow instructions that may be specific for post-processing data in that case.

ADD REPLY • link 6.7 years ago by GenoMax 147k

0

Entering edit mode

Thank you for the reply. I am looking into it.

ADD REPLY • link 6.7 years ago by DVA ▴ 630

0

Entering edit mode

I'm so sorry. It is not single cell. Nonetheless I am going to try your method. Thank you.

ADD REPLY • link 6.7 years ago by DVA ▴ 630

Login before adding your answer.