Hi All,
I have a fastQ file which was left by a student which I suspect is a file which is a concatenate of a HiSeq run and a smaller MiSeq run. How do I determine if this is the case? The distribution of read length is uniform at 101 bp.
Hi All,
I have a fastQ file which was left by a student which I suspect is a file which is a concatenate of a HiSeq run and a smaller MiSeq run. How do I determine if this is the case? The distribution of read length is uniform at 101 bp.
Since you are referring to there being data from two different sequencer types the following should work. There are unique barcodes on flowcells from different types of sequencers. Following also assumes that fastq headers have not been modified in any way.
Out the following code in a file (barcode.awk
) :
BEGIN { FS = ":"; }
((NR % 4) == 1) { barcodes[$3]++; }
END {
for (bc in barcodes) {
print bc": "barcodes[bc]"";
}
}
then run like this: zcat your.fastq.gz | awk -f barcode.awk
. It should tell you if you have one or more barcodes represented along with the number of reads for each type. If your data is not compressed then cat your.fastq | awk -f barcode.awk
should be used.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Were the headers edited or that student has retained the original ones?
Can you run "
head my.fastq
" and "tail my.fastq
" on your file and paste the results here? One should be able to make a fair guess based on that.Strictly speaking however, it's not possible to be able to determine this in all situations. FASTQ is a terrible file format for metadata.
Fixed that for you.
Hahah, hey man :)