Is it possible to add additional sample ID's to the reads in a fastq file whilst demultiplexing.
I have pooled sequences from 95 wells for each plate and primer sequences determines the well ID. So currrently, after demultiplexing, I have a script that takes in the input fastq file, reads the first 30 basepairs whilst looking for "NN" and then for "Y" and converts the primer sequences to degenerate bases to get the right primer set. This primer set then helps assign well id. However to make the process workflow simpler, I would like this to happen right at the demultiplexing stage. Any insight will be most helpful.
i.e. From read in fastq file
@M04012:86:000000000-BCB57:1:1101:17394:1866 CACGGTTGACTCAGCCCTTGACCAGGCACCTCGAATTCCACAGGGC
converts to
>C04 12:86:000000000-BCB57:1:1101:17394:1866 CACGGTTGACTCAGCCCTTGACCAGGCACCTCGAATTCCACAGGGC
Here C04 is my well ID. I have a primerset Sequence file given by Name, type, chain, index and sequence. So, CO4 id is like so
Col_VK_C04,Col,VK,C04,NNTCTGTCATGAYATTGTG,,,,,
I think you should QC your fastq reads, and then merge them. Convert the merged file to fasta, and then look for left primers based on well position, after which you can trim and translate to your V-region sequences.
I edited your post because it seemed to me you want to convert from fastq (
@M04012:86:000000000-BCB57:1:1101:17394:1866
) to fasta (>C04 12:86:000000000-BCB57:1:1101:17394:1866
), is that right?I have a feeling OP wants to add the sample name in the fastq header (original post is worded badly so hard to be sure). Sounds like something needed for Qiime like pipeline.
The well location can be added regardless. I think however, it would be easier to demultiplex post merging read pairs, but I don't know the OP's downstream process or goal. Judging by the information above, this sounds similar to something I've done in the past and have written a python demultiplex script to assign sequences to wells.