trim adaptors RNA-seq
1
0
Entering edit mode
7.3 years ago

Hello everyone, I am analyzing some published dataset. I runner fastqc to check the data quality first. The fastQC reports suggested that the overrepresented sequences seem to be the sequencing index. But I checked the sequencing for the index it indicated but they didn't match. Here is the overrepresented sequences the fastQC reported. AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGTACGATCTCGTATG And it gave me a possible source, TruSeq Adapter, Index 22 (97% over 49bp). I checked the sequences for index 22 online and it is 5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACCGTACGTAATCTCGTATGCCGTCTTCTGCTTG

They didn't match. So my question is that whether I should trim the second sequences or trim the overrepresented sequences fastQC reported. How can I be sure I trim the right sequences? Thank you very much.

RNA-Seq • 3.5k views
ADD COMMENT
1
Entering edit mode

They do match. There is the initial A from the ligation and the end of the sequence you found is not present in the adapter sequence of FastQC. Go with the FastQC sequence

ADD REPLY
0
Entering edit mode

Thank you for your reply. So if I am using cutadapt to trim my adaptor sequencing. Should I trim the same sequences for both of my pair-end data. Like cutadapt -a1 AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGTACGATCTCGTATG -a2 AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGTACGATCTCGTATG SRR4011898_1.fastq SRR4011898_2.fastq -o H9_1.fastq H9_2.fastq

ADD REPLY
0
Entering edit mode

Plus, some overrepresent sequences didn't have a clue where they are so how can I trim these sequences? Here is an example. GGCTGCGACATCTGTCACCCCATTGATCGCCAGGGTTGATTCGGCTGATC 66551 0.16715454296615964 No Hit GGCTGGCTAGGCGGGTGTCCCCTTCCTCCCTCACCGCTCCATGTGCGTCC 47823 0.12011587667008239 No Hit

ADD REPLY
0
Entering edit mode

I have one more question. How much percentage of overrepresented sequences should I consider to remove them? I have one dataset, which have many overrepresented sequences, and all of them just counted about 0.1%-0.5%. Do I really need to remove them?

ADD REPLY
0
Entering edit mode

I usually ignore over-represented sequences, especially in RNAseq. It depends a lot on the library preparation and in such low levels it's not a concern. You can test the FastQC on the R2 reads, they should have the same adapter sequence. run cutadapt as you intended (isn't it -a for forward and -A for reverse?)

ADD REPLY
1
Entering edit mode
7.3 years ago
chen ★ 2.5k

If your data is pair-end sequencing, you can use AfterQC(https://github.com/OpenGene/AfterQC) to trim your adapters, automatically.

It will also tell how many reads have adapters, and how many adapter bases are trimmed.

ADD COMMENT

Login before adding your answer.

Traffic: 1822 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6