Question

Fastx-Toolkit: Fastx_Barcode_Splitter

0

Entering edit mode

11.8 years ago

k.nirmalraman ★ 1.1k

Dear All,

I have been using fastx_barcode_splitter to demultiplex my reads. Today I found that there are some of the reads that did not match to any barcodes we used in the experiment. I took a closer look and I found the problem of reads not sorted because there was atleast one base in the beginning of the read.

Example Fasta Sequenece:

 >HWI-ST863:238:C20G3ACXX:4:1204:18858:57161 1:N:0:AAACAAAA
 TACTTACCTACTTCCGCTGGTCATCCTGCGCCAATTTGATGTGTGTGGTTTTTAATTGAGCTGTATAATCTGTTTATTTTGAGGCCAAAAAAAAAAAA

Barcode: ACTTACCTACTT

TACTTACCTACTTCCGCTGGTCATCCTGCGCCAATTTGATGTGTGTGGTTTTTAATTGAGCTGTATAATCTGTTTATTTTGAGGCCAAAAAAAAAAAA
_ACTTACCTACTT

This is however a match, but the read is not sorted into corresponding barcode file.

The command I use is the following:

cat <file_name> | fastx_barcode_splitter.pl --bcfile mybarcodes.txt --bol --mismatches 3   --prefix code_ --suffix "_1" > code_1.stats

I tried option --partial, but this is super slow and I almost had to kill the process and did not improve code splitting efficiently.

Can some one help me understand if there is any better way to manage this? is there anyother splitter that can be used with ease and easily be integrated with some existing pipeline?

barcode split • 7.1k views

ADD COMMENT • link updated 9.7 years ago by pingEde ▴ 40 • written 11.8 years ago by k.nirmalraman ★ 1.1k

0

Entering edit mode

Is there any known explanation for that extra nucleotide at the beginning of your reads?

ADD REPLY • link 11.8 years ago by Manu Prestat 4.1k

0

Entering edit mode

I am unable to come up with any but barcode contamination in synthesis/purification?

ADD REPLY • link 11.8 years ago by k.nirmalraman ★ 1.1k

score 2 · Answer 1 · 2013-10-01

ngs-tools, a tool I wrote, supports this use case.

The easiest way to try ngs-tools is to install it using pip(preferably inside a virtualenv):

$ pip install ngs-tools

Please see the help for command split-by-barcode:

ngs-tools split-by-barcode --help

I adapted the code in this gist for this command. This code uses Levenshtein distance to look for partial matches, by default max distance is 3. BTW, <barcode_file> is a tab delimited file with two columns, "barcode_id" and "barcode_seq". For example:

B01    ACTTACCTACTT

This tool also has a wrapper for Galaxy