Help needed with fastq-dump (SRA toolkit)
0
0
Entering edit mode
7.6 years ago
supertech ▴ 180

Hi, I have posted this question at seqanswers, I have not gotten any response yet. I am giving a try here.

I am trying to split an .sra file into R1.fastq and R2.fastq However, I am getting single file, and I think forward and reverse reads are joined. Here is the accession number: SRR5439504.sra

Command I run is

 fastq-dump -I --split-files SRR5439504.sra

I got following output:

@SRR5439504.1.1 1 length=302
    CCATAACCCTAACCCTAACCCTAACCCTAACTCTATCCATAACCCTAACCCTTACCCTATCCCTAACCCTAACCCTAACCCTAACCCTAGCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAGCCTAAGCGTAGCCCTAAGCCTAAGCCTAAGCCAAAGCGTAAGCCTAAGCCTAAGCCACAGCATAAAAAAAAGCAAAAACATAAACCCAAGAAAAAG
    +SRR5439504.1.1 1 length=302
    F22F<2@2C?02GCFHF?FB0?0?02BB44B334?3B33/0B?20/0003@33BB33223B21E1G?2FG1BF2BB1BB2FA1BF1A112B2FAA3CBFE1FHFHGFAHGHHHHGHHGFBHHGFBAFFFGGGGFGEFEGFFBFFFFCCBBBCBCBCFFFCFFFGGGGGGGGGCFGHHHHGHHFFGHCFCGHCHFHHGHFCB1AA233333B3B0BA0133222333333333B3@3F322B321>>11@3BF@3333333B322BB/2333433/<</02<2@///2<<110////00000.
    @SRR5439504.2.1 2 length=302
    CTCTAACCCTAACTCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACTCTAACCCTAACCCTAACCCTAACCCTATCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCGTAACCCTAAGCCTAACCCTAACCCTAACCCAAAACATAAGCCAAAGCCTAACCCTAACCCCAAGCATAATCCTAAACATAATCACACA
    +SRR5439504.2.1 2 length=302
    ?1A?0GF0>2@@@10HGFFEG?00AF0/FBFB0HFB<00>0BF0B/0BF0HGBBFBFG@BB1CFBBB00>0GF>0>B0B0BA0BB01AB0F00/0B00FA0GF00F00B0FA0FF00A00G0G0A0G00AB1GGGGGGGFF>CFFFAAAA@BABBBFFFBFFFGGGGGGGE44AEAAFFEH2F2GF222A22222BB2A2B1FFC2BF1ABE10ABA131B2?3333B32??12F2B1B2F2111??1B133333300B3B0BFC00?B?F0B///C//01BB22?12@1111@@2>1111/

I would expect two files R1.fastq and R2.fastq. I am wondering if I am doing something wrong. I used

fastq-dump : 2.8.2

Thank you for the help in advance.

fastq-dump sra-toolkit • 3.5k views
ADD COMMENT
1
Entering edit mode

Looking at the SRA record the sequence seems to have been submitted as single (302 bp) reads (even though the layout is described as PAIRED) from a CIRCLE-seq experiment. So you are likely not going to get the paired-end sequence from SRA. I don't know what CIRCLE-seq is but you can take a look at the Nature protocol paper mentioned and process the data accordingly. Perhaps every read represents a circular sequence of some sort?

ADD REPLY
0
Entering edit mode

Hi GenoMax, Thank you for the answer at both sites. I checked the paper again, and finally found description of the reads. They did 150 bp paired end sequencing. They must have prepared the files wrong. I emailed the author, and let's see if they will fix it.

ADD REPLY

Login before adding your answer.

Traffic: 2518 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6