Question

How to demultiplex PacBio from CCS.h5 or fastq

0

Entering edit mode

9.8 years ago

apt.university ▴ 70

I have PacBio CCS.h5 and the corresponding fasta and fastq files and I would like to demultiplex them. Does anyone know of how this can be done in the absence of bas.h5 files.

Thanks for your help!

Mandy

sequence • 4.2k views

ADD COMMENT • link updated 2.5 years ago by Ram 44k • written 9.8 years ago by apt.university ▴ 70

Ram · Answer 1 · 2015-03-17

You can use HMMer package to identify barcodes. Start and finish barcode HMMs can be probabilistically pinned (independently) to the start_pos and end_pos of the reads where the barcodes are supposed to occur.

The two ends can then be considered together by adding their log-likelihood scores of the start_pos and end_pos HMM hits pertaining to the different barcode combinations that were used for multiplexing (your hypothesis i.e. the barcode combinations that were actually used).

Ram · Answer 2 · 2020-05-05

You can easily extract barcode sequences with below commands with bam files, but this will only applicable for exact barcode matches not suitable when there are base errors in the barcode sequences.

example:

forward barcode = "CAAGCTCACT"

sequence between barcodes = ".*"

reverse complementary barcode = "GCACGACTTG"

or = "|"

reverse barcode = "CAAGTCGTGC"

sequence between barcodes = ".*"

forward complementary barcode = "AGTGAGCTTG"

samtools view -H pacbio_reads.ccs.bam > pacbio_reads.ccs-header.sam
samtools view pacbio_reads.ccs.bam | grep 'CAAGCTCACT.*GCACGACTTG\|CAAGTCGTGC.*AGTGAGCTTG' | cat pacbio_reads.ccs-header.sam - | samtools view -Sb - > pacbio_reads.ccs.demultiplex.bam