I need to know how to trimming RNAseq of GEO dataset.
1
1
Entering edit mode
9.4 years ago
silas008 ▴ 170
I need to trimming the adapters of a GEO NCBI RNAseq. Link http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE28888 The fastqc shows that there are some Ilumina primeira and adapters. But Im not sure of acurace of fastqc for that. The autors not provide de adapters sequence on GEO. How can I know especificaly what adapters to cut? Thank you
RNA-Seq • 3.0k views
ADD COMMENT
1
Entering edit mode

In fastqc output you get over represented seqences, but they get it from first 200K sequences which is nothing in comparison to library size.

but you can make a consensus of overrepresented sequences and then check by

head -n 400000 fastq | grep 'consensus_sequences' | wc -l
tail -n 400000 fastq | grep 'consensus_sequences' | wc -l

keep on editing the sequence unless you get output of 90000-99000

I am sure there is an another way, which is with alignment something, I had read in a paper where authors took bulk of ClIP-seq data and then trimmed their adapter by some calculations.

Here is the paper, see in their method section they did something to remove adapters

ADD REPLY
2
Entering edit mode
9.4 years ago
h.mon 35k

You are two degrees of separation from your answer. On the GEO page you provided:

Citation(s) Warf MB, Shepherd BA, Johnson WE, Bass BL. Effects of ADARs on small RNA processing pathways in C. elegans. Genome Res 2012 Aug;22(8):1488-98. PMID: 22673872

When you click on the PMID link, it takes you to the PubMed entry for the paper, there you find a link to the whole article, where you learn they used the Illumina Small RNA Prep Kit v1.0 or v1.5, and Novoalign to trim 3' adaptors.

You may use BBDuk with the options tpe and tbo, it should trim adapters even without knowing its sequence.

edit: read this thread on using BBDuk for small RNA, it says it doesn't work well for them. However, Brian Bushnell - the author of BBTools - updates them like crazy, it may well be fixed by now.

ADD COMMENT
2
Entering edit mode

As a matter of fact... :)

My recommended methodology has changed slightly for situations where you do not know the adapter sequence. It still requires paired reads, though. First, you can determine the adapter sequences like this:

bbmerge.sh in1=read1.fq in2=read2.fq outa=adapters.fa reads=1m

(for small RNAs, add the flags mininsert=15 mininsert0=15)

Then you can run BBDuk:

bbduk.sh in1=read1.fq in2=read2.fq out1=trimmed1.fq out2=trimmed2.fq ref=adapters.fa k=23 mink=11 hdist=1 tbo tpe

This is more sensitive than just running BBDuk with the tbo flag and no adapter sequence.

ADD REPLY
0
Entering edit mode

Thank you so very much. I mixed your answers and could find my adapter sequence!

It is: ATCTCGTATGCCGTCTTCTGCTTG

But I don't know if I cut the whole adapter sequence or the sequence over-represented in fastqc, that is part of adapter sequence.

ADD REPLY
1
Entering edit mode

To use as the adapter sequence for trimming reads, use the whole adapter sequence, not the short over-represented kmers from fastqc. They're too short and will either be ignored or will result in false positives - longer is better.

ADD REPLY
0
Entering edit mode

Ok.

Thank you again!

ADD REPLY
0
Entering edit mode

Biostars etiquette: You can (should) also use the button "Reply" or "add comment" when replying to an answer instead of adding a new answer. Adding new answers instead of commenting on an existing one will not destroy the world, but keeps things tidy and threads of conversation easy to follow :)

ADD REPLY
0
Entering edit mode

Ohhhh... I'm sorry. I started using Biostars yesterday and I was not pay attention that. Thank you ;)

ADD REPLY

Login before adding your answer.

Traffic: 2888 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6