Question

Issue When Spliting An Srf File: 4 Files Obtained

1

Entering edit mode

14.2 years ago

Pablo Marin-Garcia ★ 2.0k

I am spliting SRF files into FASTQ using the staden io_library program srf2fastq:

    srf2fastq  -c -s ./fastq/name_split -a -n   name.srf

But some of the SRF files contain 4 chunks of sequences instead of two (paired end experiment), creating four FASTQ files _1, _2, _3, _4 with each one with the reads with /1, /2, /3 ,/4 respectively.

The problem is that the FASTQ files _2 and _4 are 'technical reads' that would be discarded and only the _1 and _3 should be use. But this mean that my reads names for the reverse reads (the FASTQ file _3 )end with /3 instead of the usual /2.

Questions:

Would it create confusion to other people leaving them ended in /3 instead of /2?.
Should I rename the reads to have /2 instead /3?.
Can I only extract the wanted 2 chunks from the SRF instead the 4 of them?

Looking at the SRF file with srf_info I can know which chunks I want:

    > srf_info -l255 name.srf

    Reading archive name.srf.
    trace_name:  + name_456:8:1:404:759 ... name_456:8:1:381:649 x10
    Reads: GOOD : 10
    Reads: TOTAL : 10
    Chunk: BASE : 10 238
    Chunk: CNF1 : 10 409
    Chunk: CNF4 : 10 2890
      Mdata key: SCALE : 10
    Chunk: SMP4 : 10 5780
    Chunk: REGN : 10 130
      Mdata key: NAME : 10
        names=forward:P;skip1:T;reverse:P;skip2:T boundaries=35;36;71 x10
    Bases: A: 306
    Bases: C: 98
    Bases: G: 123
    Bases: T: 193
    Bases: TOTAL: 720

the Chunk: REGN has the two 'skip' (called 'technical reads') and the two wanted chunks, forward and reverse (the 'application reads').

next-gen sequencing fastq • 2.5k views

ADD COMMENT • link updated 14.2 years ago by Istvan Albert 103k • written 14.2 years ago by Pablo Marin-Garcia ★ 2.0k

score 1 · Answer 1 · 2011-06-20

1

Entering edit mode

14.2 years ago

Istvan Albert 103k

Would it create confusion to other people leaving them ended in /3 instead of /2?. Should I rename the reads to have /2 instead /3?

The manual states that if you were not passing the -n flag then the files would be labeled by the region names (forward/reverse) rather than numerically. That might be the best solution that avoids confusion.

Can I only extract the wanted 2 chunks from the SRF instead the 4 of them?

Not sure, does not seem likely.

ADD COMMENT • link 14.2 years ago by Istvan Albert 103k

0

Entering edit mode

@Istvan, humm but the -n is about the filenames ( [+1] probably is better to leave it out), but still my problem are the readnames /1 and /3 that is done with -a.

ADD REPLY • link 14.2 years ago by Pablo Marin-Garcia ★ 2.0k