Demultiplex fastq sequences based on sequence header
1
0
Entering edit mode
3 months ago
ja569116 • 0

I sequenced some samples with Nanopore Barcoding Kit, and used dorado to basecall and demultiplex. Since Dorado main ouput is useless bam files, I used the --emit-fastq parameter to get a fastq file for easy data manipulation and analysis. What happened is that dorado only outputted a single fastq file and didn't demultiplex as expected.

I checked the fastq file, and what dorado did was adding the barcode information in the fasta header.

7b751f1902912_dna_r10.4.1_e8.2_400bps_sup@v4.3.0_SQK-RBK114-24_**barcode01**
75e37ed8-7d99-4442-bd8d-64a06f70da84    st:Z:2024-08-11T05:19:10.972+00:00  RG:Z:16b7818be41d06b1e531609ca967b751f1902912_dna_r10.4.1_e8.2_400bps_sup@v4.3.0_SQK-RBK114-24_**barcode01**
45724d14-85f2-41b6-aea4-2d6a62355aa6    st:Z:2024-08-11T05:19:22.944+00:00  RG:Z:16b7818be41d06b1e531609ca967b751f1902912_dna_r10.4.1_e8.2_400bps_sup@v4.3.0_SQK-RBK114-24_**barcode21**
d2721e13-f430-4005-925b-4d068dad5197    st:Z:2024-08-11T05:19:16.405+00:00  RG:Z:16b7818be41d06b1e531609ca967b751f1902912_dna_r10.4.1_e8.2_400bps_sup@v4.3.0_SQK-RBK114-24_**barcode07**
2f9236ed-8cae-4704-9c16-892c0b00e3fc    st:Z:2024-08-11T05:19:10.533+00:00  RG:Z:16b7818be41d06b1e531609ca967b751f1902912_dna_r10.4.1_e8.2_400bps_sup@v4.3.0
aeaed141-84c9-4636-bbdb-7256684bda35    st:Z:2024-08-11T05:19:03.441+00:00  RG:Z:16b7818be41d06b1e531609ca967b751f1902912_dna_r10.4.1_e8.2_400bps_sup@v4.3.0_SQK-RBK114-24_**barcode01**
1c3546fa-99cf-4eb9-b9d9-83486707ea58    st:Z:2024-08-11T05:18:17.934+00:00  RG:Z:16b7818be41d06b1e531609ca967b751f1902912_dna_r10.4.1_e8.2_400bps_sup@v4.3.0_SQK-RBK114-24_**barcode07**
a3e1e263-b48c-4738-8eb3-e3d1505f1c51    st:Z:2024-08-11T05:19:11.925+00:00  RG:Z:16b7818be41d06b1e531609ca967b751f1902912_dna_r10.4.1_e8.2_400bps_sup@v4.3.0_SQK-RBK114-24_**barcode06**
cd6220ba-5635-4a8f-b133-cd5264066dc8    st:Z:2024-08-11T05:19:03.645+00:00  RG:Z:16b7818be41d06b1e531609ca967b751f1902912_dna_r10.4.1_e8.2_400bps_sup@v4.3.0_SQK-RBK114-24_**barcode01**

Thus, the barcode information is in the sequence header. How can I split/bin the fastq sequences based on the sequence header?

Thanks;

demultiplexing barcodes fastq Nanopore • 489 views
ADD COMMENT
1
Entering edit mode
3 months ago
GenoMax 147k

Use dorado demux as follows

dorado demux --emit-fastq --output-dir demux_fastq/ --no-classify your.bam

or if you already have fastq files then probably the following (not personally tried this)

dorado demux --output-dir demux_fastq/ --no-classify your.fastq

--no-classify option seems odd but what that does is it does not try to re-assign barcodes to the reads again since they have already been classified into relevant barcode classes during basecalling (you must have used the kit type when doing that). It takes classified reads and splits them into barcode specific files.

Since Dorado main ouput is useless bam files

BAM files can keep the file sizes down. They are the only way to encode methylation calls.

ADD COMMENT

Login before adding your answer.

Traffic: 2735 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6