I have just received the raw data of an Illumina genomic library (one line) so I already have a 6GB fastq file. I know I have to trim the adaptors and condense de files taking only the "unique" sequences. But, is there any package able to do this process? I mean, processing the "raw" data of a library, or os writing your own perl scripts the only way to face the problem?
Most genomic libraries don't have problems with adaptors. They only crop up when the sequences that one wants sequenced are very short. You probably have 60-100 bases of a 200+ base DNA insert, so you won't see adaptors.
Usually, duplicates are figured out after alignment, not before. Computationally, it's easier on a sorted .bam than on raw reads, if you go by coordinates.
Thanks! yes I have this problem with the adaptors because I am sequencing small RNA (20-30 nt) then I have to trim them before starting the analysis.
there are a few adapter trimming solutions out there: http://seqanswers.com/forums/showthread.php?t=1159
Thanks! I appreciate your help! I'm also checking fastx toolkit and it seems quite useful.