Recently I started working with in-house, not published RNA-seq dataset, with the objective of assembling a new transcriptome for my species.
One of the experiences I have was sequenced on an Illumina hiseq1500 apparatus, with the library being prepared with the TruSeq protocol. All the samples sequenced using this method have 3 output sequencing files (with the suffix R1, R2, R3).
All the files have the same number of reads, with R2 containing very small reads with 8bps, while R1 and R3 sequences having 125bps.
Previous works on my lab have used the R1 and R3 files after processing, but I'm curious what the R2 file is about. My hypothesis is that it maybe related to the demultiplexing process. I have already queried the internet and supervisors, but since this data is a bit old no one remembers.
Does someone here has any experience with this type of data and has any idea what this is about?
All the samples sequenced using this method have 3 output sequencing
files (with the suffix R1, R2, R3).
Your sequencer set up to produce a separate file for the index sequences (this is not standard protocol). You are correct in that R2 is indeed the illumina index sequence for each sample. You should be able to reprocess the data to generate just 2 files per sample (in normal Illumina format with index sequences in fastq read headers), if you have access to the original data folder.
You can use the solution posted in post #5 in this thread over at SeqAnswers if you don't have the original data folder to reprocess. You will need to do this with R1 and R3 files (but rename R3 to R2).
If any downstream software objects to R3 nomenclature and/or if the illumina index needs to be in the fastq headers then you can use code above to fix your files.
Thanks for your answer! Since my reads were already divided into several samples I think I can simply ignore the index file for now.
cheers
If any downstream software objects to
R3
nomenclature and/or if the illumina index needs to be in the fastq headers then you can use code above to fix your files.