Question

Raw Illumina Data

2

Entering edit mode

13.3 years ago

Cdiez ▴ 150

Hello,

I have just received the raw data of an Illumina genomic library (one line) so I already have a 6GB fastq file. I know I have to trim the adaptors and condense de files taking only the "unique" sequences. But, is there any package able to do this process? I mean, processing the "raw" data of a library, or os writing your own perl scripts the only way to face the problem?

Thanks in advance!

illumina adaptor perl • 5.1k views

ADD COMMENT • link updated 13.3 years ago by Ying W ★ 4.3k • written 13.3 years ago by Cdiez ▴ 150

score 4 · Answer 1 · 2011-08-16

4

Entering edit mode

13.3 years ago

Swbarnes2 ★ 1.6k

Most genomic libraries don't have problems with adaptors. They only crop up when the sequences that one wants sequenced are very short. You probably have 60-100 bases of a 200+ base DNA insert, so you won't see adaptors.

Usually, duplicates are figured out after alignment, not before. Computationally, it's easier on a sorted .bam than on raw reads, if you go by coordinates.

ADD COMMENT • link 13.3 years ago by Swbarnes2 ★ 1.6k

0

Entering edit mode

Thanks! yes I have this problem with the adaptors because I am sequencing small RNA (20-30 nt) then I have to trim them before starting the analysis.

ADD REPLY • link 13.3 years ago by Cdiez ▴ 150

0

Entering edit mode

there are a few adapter trimming solutions out there: http://seqanswers.com/forums/showthread.php?t=1159

ADD REPLY • link 13.3 years ago by Jeremy Leipzig 22k

0

Entering edit mode

Thanks! I appreciate your help! I'm also checking fastx toolkit and it seems quite useful.

ADD REPLY • link 13.3 years ago by Cdiez ▴ 150

score 3 · Answer 2 · 2011-08-16

3

Entering edit mode

13.3 years ago

Ido Tamir 5.2k

The fastx toolkit provides some simple to use command line utilities to do this.

ADD COMMENT • link 13.3 years ago by Ido Tamir 5.2k

score 2 · Answer 3 · 2011-08-16

2

Entering edit mode

13.3 years ago

Darked89 4.7k

Check Tagdust from: http://genome.gsc.riken.jp/osc/english/dataresource/

If you need to do the oposite (select fastq reads with certain pattern) there is fqgrep: https://github.com/indraniel/fqgrep

ADD COMMENT • link 13.3 years ago by Darked89 4.7k

score 1 · Answer 4 · 2011-08-22

1

Entering edit mode

13.3 years ago

Bach ▴ 550

Filtering for unique sequence is a very bad habit I never understood why people would even envisage doing that: you enrich for sequencing errors.

ADD COMMENT • link 13.3 years ago by Bach ▴ 550

1

Entering edit mode

see http://seqanswers.com/forums/showthread.php?t=6854

ADD REPLY • link 13.3 years ago by Ying W ★ 4.3k

score 1 · Answer 5 · 2011-08-22

1

Entering edit mode

13.3 years ago

Ying W ★ 4.3k

You can use FastQC to figure out if you have adapter issues and also base bias.

ADD COMMENT • link 13.3 years ago by Ying W ★ 4.3k