Hi
I am working with Pacbio sequel data for few bacterial strains. I have got 3 files from the sequencing facility
sample.bam
sample.bam.pbi
sample.subreadset.xml
Question#1
How to asses the quality of the data? Since, this is sequel data, the phred scores are arbitrariliy set to exclamation mark (phred score=0). There should be some way to asses the QC. PacBio suggest using SMRTlink program to asses the quality; I can also see from the user guide (page#28) that .subreadset.xml
file contains information about Sequel sequence data. From page#110, I know that I have all the files required by SMRTlink, however, I am not sure how to import these files into SMRTlink and asses the quality? I have already installed it on a windows machine and I am able to login.
Page#109 of the same user guide says that another file called .sts.xml
contains summary statistics about the collection/cell and its post-processing. I havent receive that file. Is it required for QC?
Do I have the files required for QC using SMRTlink?
Any alternative way to perform QC?
Update: the sequel machine is not with us. Can I still perform the QC having just the 3 files mentioned above?
Question#2
I am trying to understand the pacbio sequencing chemistry. From the image below, it is clear that what I have got is the subreads (sequenced inserts devoid of the green adapters) and not CCS (circular consensus sequence). I am trying to assemble the bacteria genome with canu assembler.
image source: Pacbio
What should I use?
- the
fastq
file converted from thesubreads.bam
file? I think most of the blogs suggest that.
OR
- first I generate a CCS (smartlink?) and then use that to perform denovo assembly? This is because pacbio error rate is high and CCS will help compensate that (?). I understand that CCS reads are the result of doing a consensus base calling from subreads that are all from the same template.
What is the expected insert size in your libraries? One reason sequel is popular is one can get much longer sequences (tens of kb) so CCS may not be coming into play (when you have the right kind of library). At least that is what I would think.
Thankfully I had bookmarked this tutorial, which you may find useful: Polish PacBio assembly with latest PacBio tools : an affordable solution for everyone
While good that tutorial is for RS II which had different data formats than Sequel. It may only be partially useful.
Just wanted to update that you are correct. That tutorial is not helping much in my case.