Question

How to find differential gene and expression analysis and novel transcripts?

0

Entering edit mode

4.5 years ago

Kumar ▴ 170

Hi,

I am trying to setup pipelines for identifying differential gene expression analysis and novel transcripts.

Here is my procedures:

Downloaded data (GSE121980) from NCBI-SRA one control and one treatment .fasta files and a reference genome (S.scrofa 10.2).

After quality assessment using fastQC, aligned these files (control and treatment .fasta) with the reference S.scrofa.fa using HISAT2. In the results I have retrieved .sam and .bam files.

I tried Cufflinks and FeatureCounts (http://subread.sourceforge.net/).

I am not sure next, how I can proceed for identifying differential gene and expression analysis. I am aware about DESeq, edgeR, however I am not sure what input files are required to RUN these pipelines. I got count.txt from FeatureCounts (one control and one treatment).

Please advice how I should proceed for differential gene and expression analysis.

RNA-Seq next-gen sequencing • 1.6k views

ADD COMMENT • link 4.5 years ago by Kumar ▴ 170

score 2 · Answer 1 · 2020-05-18

2

Entering edit mode

4.5 years ago

JC 13k

What about reading the documentation?

https://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf https://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html

ADD COMMENT • link 4.5 years ago by JC 13k

0

Entering edit mode

Please let me know any assembled script or steps to RUN DESeq2 to analysis data. I am not sure, what type of file is required for DESeq2. I have output file (count.txt) from FeatureCounts.

Here is some lines of the counts.txt file (output of FeatureCounts). Is that seem correct as input of DESeq?

Geneid  Chr Start   End Strand  Length
ENSSSCG00000048769  1;1;1;1;1   1;792;2371;3118;3118    2465;961;2465;3780;3782 +;+;+;+;+   3130    0
ENSSSCG00000037372  1;1;1;1 9815;14011;18493;22046  10013;15217;18696;22152 -;-;-;- 1717    0

ADD REPLY • link 4.5 years ago by Kumar ▴ 170

1

Entering edit mode

I am more familiar with EdgeR, the input is a table with gene ids and counts, like:

Gene  Count1  Count2 Count3  Count4
A  20  23  45  65
B  19  34   23  34
C  0  0  12  14

Then in R:

> library(edgeR)
> x <- read.delim("TableOfCounts.txt",row.names="Gene")
> group <- factor(c(1,1,2,2))
> y <- DGEList(counts=x,group=group)
> keep <- filterByExpr(y)
> y <- y[keep,,keep.lib.sizes=FALSE]
> y <- calcNormFactors(y)
> design <- model.matrix(~group)
> y <- estimateDisp(y,design)
> fit <- glmFit(y,design)
> lrt <- glmLRT(fit,coef=2)
> topTags(lrt)

ADD REPLY • link 4.5 years ago by JC 13k

0

Entering edit mode

You can use featureCounts output for DESeq2; however, you need to produce a counts matrix from your featureCounts output files, and then use the DESeqDataSetFromMatrix() function from DESeq2 for importing. There is information in the vignette in the Count Matrix Input section.

ADD REPLY • link 4.5 years ago by Kevin Blighe 88k

0

Entering edit mode

Can you please let me know how to produce a counts matrix from the featureCounts output files? I have no much idea about R programming.

Thank you!

ADD REPLY • link 4.5 years ago by Kumar ▴ 170

0

Entering edit mode

To produce the counts matrix can be done outside R. Unfortunately, we cannot really take you through everything step by step. First, identify the counts column in each featureCounts file, and then explore shell ('BASH") commands that will permit you to merge these counts columns together into a single file. Take a look at the cut and paste commands, which will help you here.

If you are really a beginner in R, then you should look for tutorials about how to learn basic skills in R. If that is not enough, then I encourage you to seek help locally.

ADD REPLY • link 4.5 years ago by Kevin Blighe 88k

0

Entering edit mode

My implication is that how to prepare a matrix file from my count.txt file like you are showing below.

What I understood, do I need to add numbers of Chr column in my file of featureCounts output file and marge in a single file to prepare a input file for EdgeR.

Is that correct?

    For example:
    gene                 treatment  control
    ENSSSCG00000048769  5      5
    ENSSSCG00000048769  4      0

ADD REPLY • link 4.5 years ago by Kumar ▴ 170

0

Entering edit mode

no, chr column is Chromosome ID, the featurecount table has the counts at the last columns, in your example:

cut -f1,7 table > counts

you can generate the full matrix using __cut__ and __paste__, like:

cut -f1,7 table1 > table1_counts.tsv
cut -f7 table2 > table2_counts.tsv
paste table1_counts.tsv table2_counts.tsv > final_table.tsv

ADD REPLY • link 4.5 years ago by JC 13k

0

Entering edit mode

Thank you very much for your help. It works well.

ADD REPLY • link 4.5 years ago by Kumar ▴ 170

0

Entering edit mode

Thanks JC! / ¡Gracias!

ADD REPLY • link 4.5 years ago by Kevin Blighe 88k