Question

R1 R2 orientation (RNA-seq)

2

Entering edit mode

11.4 years ago

biotech ▴ 570

Hi,

Will always R2 file contain sense transcript and R1 reverse complement of transcript?

If R2 file contains sense transcript, I will have to run HTSeq.scripts.count with "reverse" setting to quantify sense transcription. Quite confusing, isn't it?

Library creation kit -> E7420S NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina®

Thanks, Bernardo

P.S. Here is the pipeline:

set -ue
#################################################
#BWA test aligner with paired-end data
#################################################

# Get the genome file from the command line
genome_file=$1
# Get the fastq file from the command line
fastq_file_R1=$2
# Get the fastq file from the command line
fastq_file_R2=$3
# Get the fastq file from the command line
fastq_file_R3=$4
# Get the fastq file from the command line
fastq_file_R4=$5
#get gff
GFF=$6

#BWA default settings
bwa index $genome_file
#BWA input
bwa mem -t 8 $genome_file $fastq_file_R1 $fastq_file_R2 | gzip -3 > P_S1_L001_aln-pe.sam.gz
bwa mem -t 8 $genome_file $fastq_file_R3 $fastq_file_R4 | gzip -3 > V_S1_L001_aln-pe.sam.gz

################################################################
#Flagstat
################################################################
#Convert .sam to .bam to input to Flagstat
samtools view -b -S -o P_S1_L001_aln-pe.bam P_S1_L001_aln-pe.sam.gz
samtools flagstat P_S1_L001_aln-pe.bam

samtools view -b -S -o V_S1_L001_aln-pe.bam V_S1_L001_aln-pe.sam.gz
samtools flagstat V_S1_L001_aln-pe.bam

################################################################
#Count reads mapped with htseq-count
################################################################

samtools sort -n V_S1_L001_aln-pe.bam invivo.sorted
python -m HTSeq.scripts.count -m intersection-nonempty -f bam -a 10 -t mRNA -i Parent -r name -s reverse invivo.sorted.bam $GFF | awk 'n>=5 { print a[n%5] } { a[n++%5]=$0 }' > invivo_R.counts

samtools sort -n P_S1_L001_aln-pe.bam plate.sorted
python -m HTSeq.scripts.count -m intersection-nonempty -f bam -a 10 -t mRNA -i Parent -r name -s reverse plate.sorted.bam $GFF | awk 'n>=5 { print a[n%5] } { a[n++%5]=$0 }' > plate_R.counts

################################################################
#IGV
################################################################
#samtools sort -n sorts by name, not coordinate.. index requires sort by coordinate..
samtools sort V_S1_L001_aln-pe.bam V_S1_L001_aln-pe.bam.sorted
samtools index V_S1_L001_aln-pe.bam.sorted.bam

samtools sort P_S1_L001_aln-pe.bam P_S1_L001_aln-pe.bam.sorted
samtools index P_S1_L001_aln-pe.bam.sorted.bam

RNA-seq HTSeq pipeline bash • 8.2k views

ADD COMMENT • link updated 4.0 years ago by Ram 45k • written 11.4 years ago by biotech ▴ 570

0

Entering edit mode

A file should never contain 3'->5' nucleotide sequence. Whether read #1 or read #2 will dictate the originating strand when a directional kit is used will depend on the kit. When in doubt, just look at a couple genes in IGV or another browser. It should be immediately obvious which which read dictates strand.

ADD REPLY • link updated 4.0 years ago by Ram 45k • written 11.4 years ago by Devon Ryan 105k

0

Entering edit mode

Thanks for your answer Devon, the question has been reformulated. You answered to the first question of my post. I performed the IGV inspection step and it's clear that R2 contains sense transcript. I will have to answer how library was prepared to confirm this IGV test.

ADD REPLY • link 11.4 years ago by biotech ▴ 570

Ram · Accepted Answer · 2014-06-26

3

Entering edit mode

11.4 years ago

Charles Warden 8.3k

I believe the "how" is answered on this website (1st Figure under product information tab):

https://www.neb.com/products/e7420-nebnext-ultra-directional-rna-library-prep-kit-for-illumina

Seems like it is a dUTP protocol, which would mean you should be using "--library-type fr-firststrand" in TopHat:

http://ccb.jhu.edu/software/tophat/manual.shtml#toph

http://ccb.jhu.edu/software/tophat/faq.shtml#library_type

ADD COMMENT • link updated 4.0 years ago by Ram 45k • written 11.4 years ago by Charles Warden 8.3k

0

Entering edit mode

I'm following BWA, samtools, HtSeq pipeline. R2 file would contain the sense strand. Right?

ADD REPLY • link updated 4.0 years ago by Ram 45k • written 11.4 years ago by biotech ▴ 570

0

Entering edit mode

You can double-check with the vendor - in the picture, it appeared that dUTPs were being added and then degraded, meaning that only the original strand was sequenced. The notation is a bit confusing, but this means the reverse complement is the first Illumina sequence that you get from the sense template (meaning R1 is anti-sense and R2 is sense, matching what you expect).

ADD REPLY • link updated 4.0 years ago by Ram 45k • written 11.4 years ago by Charles Warden 8.3k

0

Entering edit mode

I will double-check with the vendor but I'm 99% sure R2 is sense because of the larger number of reads counted. I added the pipeline I'm using. The most funny thing is that the technician who did the library told me: "Well, you only have to analyse data with TopHat, Cufflinks, etc and will tell you is is positive or not..." :-O

ADD REPLY • link updated 4.0 years ago by Ram 45k • written 11.4 years ago by biotech ▴ 570