Question

Sra Toolkit For Pacbio

3

Entering edit mode

12.0 years ago

Lee Katz ★ 3.2k

Hi, what commands are you all using for extacting pacbio from the sequence read archive? I don't think I want to extract simply a fastq file with fastq-dump because I'd like to view both the long reads and CCS reads in different fashions. Is there a special hdf5-dump executable that I am not seeing?

sra extraction • 7.1k views

ADD COMMENT • link updated 12.0 years ago by osullivanchristopher ▴ 210 • written 12.0 years ago by Lee Katz ★ 3.2k

score 4 · Answer 1 · 2012-12-06

4

Entering edit mode

12.0 years ago

osullivanchristopher ▴ 210

sra toolkit does not contain a utility for reconstitution of HDF5 from PacBio SRA archives - PacBio was only concerned with fastq-dump when they contacted SRA about processing their HDF5 . there is no information lost, so you could make HDF5 from the ouput of vdb-dump. but if you want consensus, it is stored in sra table called consensus. you can get it this way: fastq-dump --table CONSENSUS SRR515631 if you have questions about sra toolkit functionality, don't hesitate to email sra@ncbi.nlm.nih.gov

-Chris

ADD COMMENT • link 12.0 years ago by osullivanchristopher ▴ 210

0

Entering edit mode

Is there a way to convert SRA directly to CCS and long reads then? Dumping an SRA file as a single fastq would be confusing because the short CCS reads will be mixed in with long reads. Therefore artificial linkers could be part of the read set.

ADD REPLY • link 12.0 years ago by Lee Katz ★ 3.2k

0

Entering edit mode

A lot of information is lost by not having the hd5 file: many details on the per-base quality estimates for deletions and insertsions, and, AFAIK, information needed for base-modification detection...

ADD REPLY • link 12.0 years ago by lexnederbragt ★ 1.3k