Hi all,
I have recently started to work on RNASeq analysis. I need to do the following two aspects of analysis first, before performing the TopHat pipeline for RNASeq. I have performed demultiplexing step and also generated the fastq files using basecalls from HiSeq.
Can you guys explain me why these analyses are important to do first hand and how to proceed further?
A. the sequencing reads technical analysis: I have to perform a genome wide alignment using the RNA_seq data sets of lane 1 to lane 6, and I have to output the information on the sequencing reads technical analysis like:
1. The reads duplication analysis;
2. The contamination analysis of the Illumina adaptor sequences;
3. The GC content analysis.
B. the biological quality analysis: using the mapping results above, also I need to output the biological quality analysis of the data sets like:
1. The percentage of the sequencing reads derived from the rRNA genes;
2. The percentage of the sequencing reads derived from the globin gene;
3. Because this is a strand specific RNA-seq, I have to include the sense and antisense information for the corresponding genes.
Why cant the person tasking you also explain the rationale behind these orders?
The person tasking you with these really shouldn't. Aside from (A), which can be done entirely with FastQC, there are often nuances with how things should be implemented and you would need to be quite comfortable with RNAseq data before dealing with this.
Also, use a different sequencing facility next time. Needing to demultiplex things yourself is absolutely absurd.
We sometimes do our own demultiplexing because we use barcode setups that the core doesn't like, especially in development of new in-line barcoding products. They'll set up a new demultiplexing for us but we don't ask until the thing is done.
Sure, but it doesn't sound like nalandaatmi is working on a new method.
Dear Devon,
Can you explain me about the nuances with regards to RNAseq or redirect to some links where I can find.
Why do you say demultiplex things is absurd?
Making end users demultiplex standard data is absurd because that's a lot of extra work to get things set up when the sequencing facility could just do it as part of a standard pipeline. I've used a number of core facilities and companies over the years and have never needed to demultiplex things as a customer (I do now, but I'm not the customer any more :) ).
Regarding RNAseq, that's a long discussion. You'd be well advised to work together with someone locally the first time you do a new type of analysis like this (at least until you get a fair bit of experience under your belt).
Devon, I am learning the NGS stuffs in a sequencing facility. The sequencing person in charge gave me the files which are directly from the HiSeq sequencing machine. I am interested in learning from the very first step of NGS reads. That's why I mentioned, I did perform demultiplexing step and generated fastq files from base calling files. I am trying to understand what are all the steps involved before downstream analysis.
As you mentioned that you do these NGS demultiplexing stuffs now, I would like to ask you this query. Using
bcltofastq
program I converted the base calls files to fastq files. When the fastq files are generated it has naming convention like theseWES01
is sample name,AGTCCA
- barcode or index,L001
- Lane 1,R1
- Forward reads,R2
- Reverse reads, what is 001, 002, 003 to 010 after R1 and R2?@nalandaatmi in my experience, the ' 001.fastq, 002.fastq, 003.fastq, ... ' you are referring to usually means that the fastq file was split into smaller parts. So if you merged the files together end-to-end, you would get all the reads.
Thanks Tamir and Ryan for your suggestions.
Dear Pierre,
Thanks for your explanation. Yeah I have done that fastqc analysis, I am investigating these sections in the fastqc.html file for A section.
B section, working on it.
Hi All,
For section B, I am planning to take a list of ribosomal RNA genes and align it with my sample reads using bowtie2 tool. I assume the overall alignment rate which bowtie2 outputs will be the percentage of reads matching ribosomal RNA genes. Am I correct?
Open a new question. This is not a discssion forum but a Q&A. 1 Question + Answers, not 1 Question + Comments + Answers + more Questions.
Apologies, I thought I am still following up with my section b of my first question. Hereafter, I will make it a separate query. Thanks for letting me know about it.