Hi, what commands are you all using for extacting pacbio from the sequence read archive? I don't think I want to extract simply a fastq file with fastq-dump because I'd like to view both the long reads and CCS reads in different fashions. Is there a special hdf5-dump executable that I am not seeing?
sra toolkit does not contain a utility for reconstitution of HDF5 from PacBio SRA archives - PacBio was only concerned with fastq-dump when they contacted SRA about processing their HDF5 . there is no information lost, so you could make HDF5 from the ouput of vdb-dump.
but if you want consensus, it is stored in sra table called consensus. you can get it this way:
fastq-dump --table CONSENSUS SRR515631
if you have questions about sra toolkit functionality, don't hesitate to email sra@ncbi.nlm.nih.gov
Is there a way to convert SRA directly to CCS and long reads then? Dumping an SRA file as a single fastq would be confusing because the short CCS reads will be mixed in with long reads. Therefore artificial linkers could be part of the read set.
A lot of information is lost by not having the hd5 file: many details on the per-base quality estimates for deletions and insertsions, and, AFAIK, information needed for base-modification detection...
Is there a way to convert SRA directly to CCS and long reads then? Dumping an SRA file as a single fastq would be confusing because the short CCS reads will be mixed in with long reads. Therefore artificial linkers could be part of the read set.
A lot of information is lost by not having the hd5 file: many details on the per-base quality estimates for deletions and insertsions, and, AFAIK, information needed for base-modification detection...