Processing Barcoded Hiseq Data
2
0
Entering edit mode
12.5 years ago
bbio ▴ 90

I recently received some data from an Illumina HiSeq lane, in which the reads are all supposed to start with one of ten different 6bp barcodes. But when I try to split up the data by barcode, more than half of the reads don't actually match any of the barcodes. I imagine that this is due to the relatively high error rate of HiSeq during the first few cycles.

How do you usually deal with this? I assume one option would be some sort of fuzzy matching from real sequence to barcode (e.g. allow for one different nt), although I'm afraid this might introduce a whole new of bias. Is there any software for this out there?

illumina hiseq barcode • 3.8k views
ADD COMMENT
1
Entering edit mode
12.5 years ago
Arun 2.4k

Just to be clear, the Hi-seq 2000 has info regarding barcode on each fastq read's header line and not as part of the sequence as it used to be in GAII. Are you extracting barcodes from header? This document might help: http://biowulf.nih.gov/apps/CASAVA1_8_Changes.pdf

If you do this right, then did you try searching for barcodes with 1 mismatch? Barcodes are supposed to have at least 2 mismatches between each other. You might be able to recover some?

ADD COMMENT
0
Entering edit mode

None of the headers appear to contain a barcode, so I think this isn't the problem. I didn't try to allow for mismatches yet - this is what I meant by fuzzy matching above.

ADD REPLY
0
Entering edit mode

Which version of CASAVA was used for your FASTQ? Rather, what is your sequencer? Even better, could you paste 1 whole read including header, sequence and quality here?

You don't need to do fuzzy matching as the maximum number of mismatches with which you could safely identify the barcode is just 1. For example in perl, you could just write a subroutine:

sub hd { 
    length( $_[ 0 ] ) - ( ( $_[ 0 ] ^ $_[ 1 ] ) =~ tr[\0][\0] ) 
}

and then call hd(str1, str2). The value it returns tells you the number of mismatches, and you can safely consider up to 1 mismatch, I believe. If not, I guess some one would correct me.

ADD REPLY
0
Entering edit mode

Thank you for the suggestions, I actually just ended up using the fastx barcode splitter with the --mismatch flag.

ADD REPLY
1
Entering edit mode
12.5 years ago

There's the fastx barcode splitter.

ADD COMMENT
1
Entering edit mode

I had an unsuccessful attempt at using this before, but after your post I built the latest version from source instead of trying to use the precompiled binaries and that finally worked. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2918 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6