Entering edit mode
7.6 years ago
José
▴
10
Hello, Last week I've received the genome PacBio data from a grass. I have a few questions about the data. -The provider give me a file with only the Filtersubreads (Filter=single pass and remove adapters) in Fastq format. There is ok or the must give me more data? maybe .h5? -The fastq subreads have an exclamation mark (The poorest value) as quality score, i don't know why, it's ok? I run FastQC and all the data have the same value.
The provider says me that the data have %85 of accuracy. It is ok for PacBio data, but i can't measure it.
Thank's
José
What is it that you want to do with this data?
Grab a copy of the original (*.h5) data files. Those would be needed for some software/analyses. Also ask your provider to run
RS_ReadsOfInsert
workflow which will give you consensus sequence for those subreads.As genomax said, you'll want to dig deeper to make sure you are working with consensus sequence from the subreads. There are plenty of resources for working from .bax.h5 files:
Pacbio: extract fastq from h5 file based on quality filtering
Brent Wilson, PhD | Project Scientist | Cofactor Genomics 4044 Clayton Ave. | St. Louis, MO 63110 | tel. 314.531.4647 Catch the latest from Cofactor on our blog.