I am spliting SRF files into FASTQ using the staden io_library program srf2fastq:
srf2fastq -c -s ./fastq/name_split -a -n name.srf
But some of the SRF files contain 4 chunks of sequences instead of two (paired end experiment), creating four FASTQ files _1, _2, _3, _4 with each one with the reads with /1, /2, /3 ,/4 respectively.
The problem is that the FASTQ files _2 and _4 are 'technical reads' that would be discarded and only the _1 and _3 should be use. But this mean that my reads names for the reverse reads (the FASTQ file _3 )end with /3 instead of the usual /2.
Questions:
Would it create confusion to other people leaving them ended in /3 instead of /2?.
Should I rename the reads to have /2 instead /3?.
Can I only extract the wanted 2 chunks from the SRF instead the 4 of them?
Looking at the SRF file with srf_info I can know which chunks I want:
> srf_info -l255 name.srf Reading archive name.srf. trace_name: + name_456:8:1:404:759 ... name_456:8:1:381:649 x10 Reads: GOOD : 10 Reads: TOTAL : 10 Chunk: BASE : 10 238 Chunk: CNF1 : 10 409 Chunk: CNF4 : 10 2890 Mdata key: SCALE : 10 Chunk: SMP4 : 10 5780 Chunk: REGN : 10 130 Mdata key: NAME : 10 names=forward:P;skip1:T;reverse:P;skip2:T boundaries=35;36;71 x10 Bases: A: 306 Bases: C: 98 Bases: G: 123 Bases: T: 193 Bases: TOTAL: 720
the Chunk: REGN
has the two 'skip' (called 'technical reads') and the two wanted chunks, forward and reverse (the 'application reads').
@Istvan, humm but the -n is about the filenames ( [+1] probably is better to leave it out), but still my problem are the readnames /1 and /3 that is done with -a.