Fastq-dump warning: too many reads at spot id

1

Entering edit mode

2.7 years ago

melissachua90 ▴ 70

I want to download a dataset PRJNA281410 from SRA and corresponding reference genome (fasta format) . My code

esearch -db sra -query PRJNA281410 \
| elink -target assembly \
| efetch -format docsum \
| xtract -pattern DocumentSummary -element FtpPath GenBank \
| cut -d ',' -f 1 \
| grep SRR \
| xargs -n 1 -P 4 fastq-dump --split-files --gzip --skip-technical SRR18781516

Warning of skipped lines (many lines of the following warning): fastq-dump warn: too many reads at spot id XXX, maximum YY supported, skipped

References:

fastq-dump • 910 views

ADD COMMENT • link updated 2.7 years ago by Pappu ★ 2.1k • written 2.7 years ago by melissachua90 ▴ 70

0

Entering edit mode

If you are using xargs to pass values why do you have a fixed SRR18781516 at the end of your command? Additionally your command as posted does not work past the first search step. Many of these datasets are PacBio so I don't think that blanket fastq-dump command will work.

Finally not sure what you mean by

corresponding reference genome (fasta format)

Are you looking to get the reference genomes for the two bacteria that are part of the data?

ADD REPLY • link 2.7 years ago by GenoMax 148k

0

Entering edit mode

fastq-dump threw an error without specifying the SRR

ADD REPLY • link 2.7 years ago by melissachua90 ▴ 70

0

Entering edit mode

You can check out ENA which also provides fastq files.

ADD REPLY • link 2.7 years ago by Pappu ★ 2.1k

Login before adding your answer.