SRA: fastq-dump gives different number of sequences
1
0
Entering edit mode
6.1 years ago
jeetsahu ▴ 10

I have downloaded read sequences using fastq-dump with split file option and SRR id for paired sequences. But splitted files have different number of sequence reads. As per my understanding, since these are paired-end reads these should have equal number of sequences.

$fastq-dump -I --split-files SRR390728

$grep -c '>' SRR7716545_1.fastq

694067

$grep -c '>' SRR7716545_2.fastq

1026976

Please correct me if I am wrong.

sra sequence • 1.8k views
ADD COMMENT
3
Entering edit mode
6.1 years ago
ATpoint 86k

Both files have the same number of reads. You have to grep for '^@', because @ is the fastq header prefix. > is fasta.

ls *.fastq | parallel "echo {} && grep -c '^@' {}"
SRR7716545_1.fastq
5644111
SRR7716545_2.fastq
5644111
ADD COMMENT
0
Entering edit mode

Thanks, I grepped different symbol. One quick question - Does fastq-dump gives latest dataset used for assembly? if yes how can I get old datasets?

ADD REPLY
0
Entering edit mode

fastq-dump gives the fastq based on the input SRR you give it. I have no detail knowledge about your SRR.

ADD REPLY
0
Entering edit mode

Hello jeetsahu ,

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.

Upvote|Bookmark|Accept

ADD REPLY

Login before adding your answer.

Traffic: 2062 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6