Best practices for differential expression analysis
2
0
Entering edit mode
5.4 years ago
flogin ▴ 280

I'm searching about best practices for differential expression analysis, and I found that paper https://www.ncbi.nlm.nih.gov/pubmed/24300110 (the most closely paper related to my questions)

But, that paper talk about the methods of doing the differential expression analysis starting with an input with expression data, like the input for DESeq package, right?

But, I'm thinking about the whole project, for example:

  • If I have only a few sequences (e.g. genes) and not the whole assembled genome, Can I make a differential expression analysis?

  • Which is the best tool to make it? (considering the situation above)

  • It's necessary a kind of normalization before the gene expression analysis?

With my knowledge, I designed an experiment like this:

  • Sequence reference: A fasta file with nucleotide information of 5 genes.
  • RNA-seq libraries: fastq files from RNA-seq experiments with the following conditions: Control, Treatment 1, Treatment 2, Treatment 3.
  • Mapping: Bowtie2
  • Output conversion using bam files information to make a table with a count of alignments of each mapping analysis.
  • DESeq analysis using as input the output created in the previous step.

It's is that? I have no idea if a simple mapping analysis with bowtie using just the sequence of genes can be used to infer gene expression difference.

Best,

transcriptome RNA-Seq mapping bowtie best • 3.4k views
ADD COMMENT
2
Entering edit mode

With only 5 genes of interest, why aren't you using qPCR?

ADD REPLY
0
Entering edit mode

because I'm working with public data with a lot of different species....

ADD REPLY
0
Entering edit mode

So does that mean you're not actually going to perform the sequencing yourself, but you're going to download data that other people have sequenced and deposited in a public repo?

ADD REPLY
0
Entering edit mode

Your working with lots of public NGS data with only 5 genes?

ADD REPLY
1
Entering edit mode

A couple of points; 3 replicates is the bare minimum. DESeq uses information from all the genes to estimate dispersion, that step might be a little strange with only a handful of genes being measured.

ADD REPLY
4
Entering edit mode
5.4 years ago

Some comments :

Sequence reference: A fasta file with nucleotide information of 5 genes. RNA-seq libraries: fastq files from RNA-seq experiments with the following conditions: Control, Treatment 1, Treatment 2, Treatment 3.

As you do RNA-Seq you should either : align (STAR, HiSat2) against the whole genome (e.g. hg38 if human) not your genes of interest, then count the number of reads per gene (featurecounts, htseq-count) or use pseudo-aligner directly on transciptome (kallisto, salmon).

Also you should have more then one control otherwise it will be impossible to infer any statistical significance with this.

Mapping: Bowtie2

You can use use bowtie2 but only if you align on transcriptome. For whole genome alignment use a splice-aware aligner such as STAR or Hisat2.

Output conversion using bam files information to make a table with a count of alignments of each mapping analysis. Use featurecounts or htseq-count with the correct annotation file (gtf). ENSEMBL ones are pretty good ( Check Gene sets column : http://www.ensembl.org/info/data/ftp/index.html )

DESeq analysis using as input the output created in the previous step.

DESeq2 to be precise ;)

ADD COMMENT
0
Entering edit mode

So, three samples of each treatment, including the control, it's a good choice?

Using the whole genome mapping, with START for example, after the mapping I can use a gtf file with information only for my five genes to extract the information that I want, or should I use a gtf file with all genes information? (I'm not working with model organisms).

ADD REPLY
2
Entering edit mode
5.4 years ago

AFAIK, for now, there is no "best practises". If I have time I try to do the same workflow with different softwares and compare results.

If I have only a few sequences (e.g. genes) and not the whole assembled genome, Can I make a differential expression analysis?

I assume you used cDNA capture or related capture technique to extract your RNA of interest. If so, it is totally fine to do gene expression analysis using your data.

RNA-seq libraries: fastq files from RNA-seq experiments with the following conditions: Control, Treatment 1, Treatment 2, Treatment 3.

You only have n=1 ? The statistical power of your experiment will be very low, becareful on results interpretation

First, take a look at your reads quality using fastQC or fastp to have an overall look at your sequencing

You can align your read in a reference genome (complete genome) to check if your reads are falling into your gene coordinates, which will be a good check validation of your capture.

If you are aligning to a genome, do not align your reads with a non slipce aware aligner as Bowtie2 without specific options. With default option Bowtie2 is not aware of splice events you will have in your genes, prefer HISAT2 or STAR. Also, you can take a look at pseudocount software like Kallisto or Salmon. If you are aligning on a transcriptome Bowtie2 will be ok

For the counting part you can use featureCounts or HTseq, or use pseudocount with Kallisto and Salmon.

If you want to look at expression variation between gene A and gene B in the same condition, TPM normalization will be enought.

If you are looking at variation of gene A across conditions, tools like edgeR, DESeq2 or Sleuth will help you.

See also for normalization : RNA-seq, why normalize for library size?

ADD COMMENT
0
Entering edit mode

Thanks for the advice, I'm gonna use at least 3 samples of each treatment (including control), so, according to with your response, I can make a mapping analysis against the whole genome, and check the results with my genome coordinates? Can I use a bed file to do this?

ADD REPLY
1
Entering edit mode

You can use IGV to see if your reads are aligning to your gene of interest. If it is all good you can use your bed file to recover all the read falling in your bed file positions using samtools

ADD REPLY
0
Entering edit mode

Right, thanks for the support Bastien.

ADD REPLY

Login before adding your answer.

Traffic: 1611 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6