Question

trim adaptors RNA-seq

0

Entering edit mode

7.3 years ago

sophialovechan ▴ 80

Hello everyone, I am analyzing some published dataset. I runner fastqc to check the data quality first. The fastQC reports suggested that the overrepresented sequences seem to be the sequencing index. But I checked the sequencing for the index it indicated but they didn't match. Here is the overrepresented sequences the fastQC reported. AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGTACGATCTCGTATG And it gave me a possible source, TruSeq Adapter, Index 22 (97% over 49bp). I checked the sequences for index 22 online and it is 5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACCGTACGTAATCTCGTATGCCGTCTTCTGCTTG

They didn't match. So my question is that whether I should trim the second sequences or trim the overrepresented sequences fastQC reported. How can I be sure I trim the right sequences? Thank you very much.

RNA-Seq • 3.5k views

ADD COMMENT • link updated 7.3 years ago by chen ★ 2.5k • written 7.3 years ago by sophialovechan ▴ 80

1

Entering edit mode

They do match. There is the initial A from the ligation and the end of the sequence you found is not present in the adapter sequence of FastQC. Go with the FastQC sequence

ADD REPLY • link 7.3 years ago by Asaf 10k

0

Entering edit mode

Thank you for your reply. So if I am using cutadapt to trim my adaptor sequencing. Should I trim the same sequences for both of my pair-end data. Like cutadapt -a1 AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGTACGATCTCGTATG -a2 AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGTACGATCTCGTATG SRR4011898_1.fastq SRR4011898_2.fastq -o H9_1.fastq H9_2.fastq

ADD REPLY • link 7.3 years ago by sophialovechan ▴ 80

0

Entering edit mode

Plus, some overrepresent sequences didn't have a clue where they are so how can I trim these sequences? Here is an example. GGCTGCGACATCTGTCACCCCATTGATCGCCAGGGTTGATTCGGCTGATC 66551 0.16715454296615964 No Hit GGCTGGCTAGGCGGGTGTCCCCTTCCTCCCTCACCGCTCCATGTGCGTCC 47823 0.12011587667008239 No Hit

ADD REPLY • link 7.3 years ago by sophialovechan ▴ 80

0

Entering edit mode

I have one more question. How much percentage of overrepresented sequences should I consider to remove them? I have one dataset, which have many overrepresented sequences, and all of them just counted about 0.1%-0.5%. Do I really need to remove them?

ADD REPLY • link 7.3 years ago by sophialovechan ▴ 80

0

Entering edit mode

I usually ignore over-represented sequences, especially in RNAseq. It depends a lot on the library preparation and in such low levels it's not a concern. You can test the FastQC on the R2 reads, they should have the same adapter sequence. run cutadapt as you intended (isn't it -a for forward and -A for reverse?)

ADD REPLY • link 7.3 years ago by Asaf 10k

score 1 · Answer 1 · 2017-08-14

1

Entering edit mode

7.3 years ago

chen ★ 2.5k

If your data is pair-end sequencing, you can use AfterQC(https://github.com/OpenGene/AfterQC) to trim your adapters, automatically.

It will also tell how many reads have adapters, and how many adapter bases are trimmed.

ADD COMMENT • link 7.3 years ago by chen ★ 2.5k