The sequencing core at my university performed paired end RNA-seq on some of our lab's samples using Illumina sequencing technology. My understanding is that generally the forward and reverse read names are designated with trailing /1 and /2 e.g.
D5KHLFN1_0181:1:1101:1209:2028#0/1
D5KHLFN1_0181:1:1101:1209:2028#0/2
However, our results came back with /1 and /3 suffixes instead. The sequencing core claims this is an artifact of "tru-seq" sequencing. I was wondering if anyone could confirm this and/or elaborate.
(The main reason for my interest here is that certain de novo transcriptome assemblers actually require the "/2" rather than "/3", forcing me to do a sed search/replace)
I expand on my previous answer on request and also push illumina2bam.
Illumina allows combining multiple libraries into one lane using multiplexing. Illumina multiplexes with an additional read that reads a short sequence within in the adapter after it has read the first read. This results in a sequence of read1:index-read for single end reads and read1:index-read:read2 for paired end reads. So paired end read2 -> /3. Seems like in your case an indexed read was specified - maybe it was necessary for other lanes, or the wrong program was chosen.
It is superior to simply adding a short barcode at the beginning of the product because you have less problems with basecalling (normal complexity at start of reads).
I suggest everyone involved in collecting data from the machine to have a look at bam as primary output format instead of fastq and maybe push for it:
you have less problems with the scale of the quality values. This was changed 4 times now.
more important: all the provenance information is saved within the file, and if you have a
correctly working pipeline set up - I am far from that :-( - all programs save the transformations on the data in the file. You know exactly what happened (which parameters, which version etc...).
2 possibilities exist to my knowledge:
* [illumina2bam] which reads directly from the saved bcl files and its **easy** to use!
* [IlluminaBasecallsToSam] picards which I think starts from the qseq files.
In the case of illumina2bam there is a great pipeline that takes the basecalls and puts the index read into the tags of the read in the bam file. Easy to parse, easy to split, merge etc.
Thank you bery much for that ... I learned something really new and valuable today. Although I am not sure I like the BAM idea. FASTQ is nice because one can do a lot of tricks already on the command line with head, tail, sed, etc.pp and that is not possible anymore with BAM.
Thank you very much for that ... I learned something really new and valuable today.
Although I am not sure I like the BAM idea. FASTQ is nice because one can do a lot of tricks already on the command line with head, tail, sed, etc.pp and that is not possible anymore with BAM.
I had the same issue, I simply used sed to convert my reads to /2 and moved on. Sequencing artifact is absurd. This is explained better in the previous answer.
I rewrote the answer. I started with my agenda in promoting bam.
Errrrm ... I did not get that. Care to explain a bit more in-depth?
Thank you bery much for that ... I learned something really new and valuable today. Although I am not sure I like the BAM idea. FASTQ is nice because one can do a lot of tricks already on the command line with head, tail, sed, etc.pp and that is not possible anymore with BAM.
Thank you very much for that ... I learned something really new and valuable today.
Although I am not sure I like the BAM idea. FASTQ is nice because one can do a lot of tricks already on the command line with head, tail, sed, etc.pp and that is not possible anymore with BAM.