Question

TruSeq sequencing output

0

Entering edit mode

3.8 years ago

Rogerio Ribeiro ▴ 110

Good morning good people of Biostars

Recently I started working with in-house, not published RNA-seq dataset, with the objective of assembling a new transcriptome for my species. One of the experiences I have was sequenced on an Illumina hiseq1500 apparatus, with the library being prepared with the TruSeq protocol. All the samples sequenced using this method have 3 output sequencing files (with the suffix R1, R2, R3). All the files have the same number of reads, with R2 containing very small reads with 8bps, while R1 and R3 sequences having 125bps.

Previous works on my lab have used the R1 and R3 files after processing, but I'm curious what the R2 file is about. My hypothesis is that it maybe related to the demultiplexing process. I have already queried the internet and supervisors, but since this data is a bit old no one remembers.

Does someone here has any experience with this type of data and has any idea what this is about?

RNA-Seq True-seq2 • 1.1k views

ADD COMMENT • link updated 3.8 years ago by GenoMax 147k • written 3.8 years ago by Rogerio Ribeiro ▴ 110

score 1 · Answer 1 · 2021-01-28

All the samples sequenced using this method have 3 output sequencing files (with the suffix R1, R2, R3).

Your sequencer set up to produce a separate file for the index sequences (this is not standard protocol). You are correct in that R2 is indeed the illumina index sequence for each sample. You should be able to reprocess the data to generate just 2 files per sample (in normal Illumina format with index sequences in fastq read headers), if you have access to the original data folder.

You can use the solution posted in post #5 in this thread over at SeqAnswers if you don't have the original data folder to reprocess. You will need to do this with R1 and R3 files (but rename R3 to R2).

paste -d '~' <(zcat R1.fq.gz) <(zcat R2.fq.gz) | perl -F'~' -lane 'push(@buffer, $F[0]); if($line == 1){@buffer[0] .= "$F[1]"}; if(($line == 3) && @buffer){print join("\n",@buffer); @buffer = ()}; $line = ($line+1) % 4;' | gzip - > WithBarcode_R1.fq.gz

and

paste -d '~' <(zcat R3.fq.gz) <(zcat R2.fq.gz) | perl -F'~' -lane 'push(@buffer, $F[0]); if($line == 1){@buffer[0] .= "$F[1]"}; if(($line == 3) && @buffer){print join("\n",@buffer); @buffer = ()}; $line = ($line+1) % 4;' | gzip - > WithBarcode_R2.fq.gz