Hi everybody,
I am working with some Illumina sequences and I have some doubts. Could someone help me, please? This sequences are from Hi-Seq 2000 but it was run in 2010.
I am not sure if it is single or paired end. How can I check that?
I am trimming it using trimmomatic
trimmomatic SE -threads 10 -phred64 \
../../rawdata/BRS_I24.fastq \
BRS_I24_trim.fastq \
ILLUMINACLIP:/data/apps/trimmomatic/0.36/adapters/TruSeq2-SE.fa:2:30:10 \
LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:36
-However, in that time. Illumina had changed the type of quality score and then later the got back to the old one, I donĀ“t know which one should I use. If is it the -phred64 or -phred33? The quality score looks like this:
@HWI-ST365_0091:2:1:1192:1999#TGTCAT/1
CCTGCTCTAAATGCTTCTATTTGCCGCATGATTCCAGTCTTGACAGTTGCATCTGCCACCAAGGATATATACTCCTCCAAATTGTTGATATCAACAATT
+HWI-ST365_0091:2:1:1192:1999#TGTCAT/1
YXYY[[X[[cccccccccccccccccccc_____c__ccccc_c[cccZccccccZccccc\_c[\]]Z]YYYY[ZXXRYYY]V[[[[XSUSUUXXXRZ
-Also I was seeing that when the sequencing is from GA II it is better to use TruSeq2-SE.fa:2:30:10, while with Hiseq it is better use TruSeq3-SE.fa:2:30:10. Is that right? I know the the sequencing kit used was TruSeq(TM) SBS v5.
Thanks in advance
Thank you, genomax!
Do you also have any idea about my other questions?
The example you posted seems be to in sanger fastq format. I generally use
bbduk.sh
from BBMap suite for trimming. That software contains anadapters.fa
file in theresources
directory in software bundle that covers all commonly used adapter sequences where you do not need to know the specifics of which version of adapters were used.