Question

Number of reads in the downloaded fastq file

0

Entering edit mode

4.3 years ago

Kash ▴ 110

Hi,

I am trying to download some data from SRA. I used fasterq-dump. This is the command I used. fasterq-dump --split-files --split-spot -O /path/fastq SRR3045676

I wanted to check whether I have downloaded all the reads for the accession. When I used vdb-dump it showed there are 166,306,903 sequence reads under this accession. vdb-dump --info SRR3045676 SEQ:166,306,903

The output file of the fasterq-dump command said it has read 332,613,806 (166,306,903 x 2) reads. But 331,487,754 (165,743,877 x 2) was written. spots read : 166,306,903 reads read : 332,613,806 reads written : 331,487,754

But when I used the following command to count the reads in the downloaded file (R1), it gives a number (165,180,851) less than 165,743,877 echo $(zcat SRR2102500_R1.fastq.gz | wc -l)/4 | bc >> /path/readCount.txt 165,180,851

Can someone please explain why the output says a less number of reads were written and why even lesser number of reads are found in the downloaded fastq file. I tried downloading this accession twice and both times gave the same results. I downloaded few other accessions and they had the exact same number of sequences given by vdb-dump --info command in the final fastq file.

SRA read count fasterq-dump vdb-dump --info • 1.2k views

ADD COMMENT • link 4.3 years ago by Kash ▴ 110