Question

PacBio subreads.fastq files?

0

Entering edit mode

11 months ago

majeedaasim ▴ 60

I have downloaded PacBio isoseq data as subreads.fastq format from NCBI. Most of the isoseq analysis tools require input as Pacbio .bam file, which is unavailable form NCBI. I want to perform differential gene expression analysis and alternative splicing analysis. I have confusion regarding the nature of the data.

Are the sequences of subreads.fastq file processed for barcode and primer removal or not?
I have read documentation of PacBio, which says that .bam file from Pacbio are converetd to fastq through bam2fastq module, which includes demultiplexing and barcode removal.
Are the subreads fastq files in NCBI generated after ccs calling or through bam2fastq without ccs calling?

ncbi PacBbio • 810 views

ADD COMMENT • link updated 11 months ago by Ram 44k • written 11 months ago by majeedaasim ▴ 60

score 0 · Answer 1 · 2023-12-12

Subreads are the raw sequences without the adapters (smartbells). If you have inline barcodes or primers as part of the sequence, the will be present in the subreads; barcodes within the smartbell adapter will not be in the subreads. PacBio reads used to be fastq before the moved to bam, IIRC, so old data may have never been in bam format in the first place. The subreads files are NOT error-corrected. Sometimes the read headers are useful to look at; the subreads file should have multiple consecutive reads that come from the same ZMW, while the CCS file should only have a single read per ZMW (with very high quality scores).