How To Split Bar Coded Solid Ngs Reads?
3
3
Entering edit mode
14.2 years ago
Ian 6.1k

Has anyone used Bioscope 1.2 to split barcoded and pooled SOLiD sequence reads? E.g. if i start with one .csfasta file containing six different barcodes I want to be able to split them into six different .csfasta files, each containing only one barcode.

I realise i can probably script this splitting process myself, but i would like to know how this can be achieved using Bioscope.

Thanks!

solid barcode • 4.4k views
ADD COMMENT
0
Entering edit mode

Thank you everyone for your comments. In the end i wrote my own script to split the barcoded reads. Basically it pairs the core read id (number_number_number) of the sequence and qualities scores with the barcode id.

I made the decision to allow missed colours (designated by '.' in the sequence and '-1' in the quality scores) and one wrong colour call. In both cases there has to be only one unambiguous match to one of the sixteen 5mer barcodes.

I am testing it at the moment, but if anyone is interested in taking a look i am happy to make it available.

ADD REPLY
7
Entering edit mode
14.2 years ago

Members of the Pugh Lab have developed a barcode splitter for the SOLiD platform. It is a perl script that you can get from:

http://bcc.bx.psu.edu/static/sep_tag_by_color_barcode-v1.1.pl

One neat feature is that it allows mismatches.

ADD COMMENT
0
Entering edit mode

Unfortunately our reads have the barcodes already removed, but this is a very interesting script nonetheless.

ADD REPLY
0
Entering edit mode

You may be able to generate a new file with the barcodes added to the beginning of each read then the script would work.

ADD REPLY
0
Entering edit mode

OK that is probably worth trying.

ADD REPLY
0
Entering edit mode

One feature that I am exploring for the new version of BioStar is to integrate it with a source code version control system to allow people to share code that may be longer than a small snippet.

ADD REPLY
5
Entering edit mode
14.2 years ago
Michael 55k

I hadn't heard about Bioscope, it seems to be a complex commercial suite by Applied Biosystems, but given there is the Fastx toolkit with the barcode splitter on Galaxy, it's maybe not necessary to use such a complex tool anyway.

Edit: Caveat: Fastx might not be able to process in colorspace, does someone know?

My recommendation: keep it simple, you possibly don't need to bother with such a complex (some people might call it bloated) tool as the one you are mentioning that might cost you license fees for trivial tasks.

Just as an aside point: In my opinion the vendors should focus more on their strength in producing rock-'solid' instruments and not on building software suits promising lofty "analysis without bioinformatics knowledge", because they cannot really offer you that, as they lack largely the programming and bioinformatics expertise themselves.

ADD COMMENT
0
Entering edit mode

In case it helps, Galaxy URL: http://main.g2.bx.psu.edu/

ADD REPLY
0
Entering edit mode

Thanks i have noticed fastX before, but i don't think it handles SOLiD colour space reads...

ADD REPLY
0
Entering edit mode

Doesn't it? I didn't try it for that, ducumentation just states it handles fasta and fastq. Last resort: to convert to letter-space before splitting, but ofc you might want to align in color-space...

ADD REPLY
0
Entering edit mode

I'm currently setting a local installation of Galaxy on our cluster, and I haven't found a way of splitting by barcodes using Galaxy only. as far as I understand the issue, we will have to pre-process our csfasta files with a custom barcode splitter such as the ones described here, and then feed the analysis software (galaxy or any other) with the splitted csfasta files.

@michael: BioScope is the free (until now) tool from Applied that claims to be very easy to install and use. but I can't really talk about its usability yet because we've waiting 2 month to have it installed ;)

ADD REPLY
0
Entering edit mode

sorry, I just came through the fastx instructions on how to integrate their toolkit into Galaxy. FYI: http://hannonlab.cshl.edu/fastx_toolkit/galaxy.html

ADD REPLY
2
Entering edit mode
14.2 years ago
Ian 6.1k

In the end i have written a script to answer my own question, but thanks for the previous answers!

Basically my script pairs the core read id (number_number_number) of the sequence and qualities scores with the barcode id. I made the decision to allow missed colours (designated by '.' in the sequence and '-1' in the quality scores) and one wrong colour call. In both cases there has to be only one unambiguous match to one of the sixteen 5mer barcodes.

I am testing it at the moment, but if anyone is interested in taking a look i am happy to make it available.

ADD COMMENT
0
Entering edit mode

Hi Ian, could you share the script with me please? The email address is my username here @gmail.com.

ADD REPLY

Login before adding your answer.

Traffic: 1533 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6