Question

Get counts from Balgown for differential expression by suing DeSeq2

0

Entering edit mode

7.6 years ago

orhanbellur1117.ob • 0

Hello Everyone, I am new to RNA-seq analysis. First of all I 'd like give some information about my data. My data is consist of 24 SRR file related to Sex-specific and lineage-specific alternative splicing in primates. Primates are human and chimpanzees. we used RNAseq to study transcript levels in humans and chimpanzees, using liver RNA samples from three males and three females from each species. For each sex there are two replicates. Briefly, For human, I have 12 RNA-seq file: Three males x2 replicates (For example: male1 rep1,male1 rep2 and so on.) three females x 2 replicates, for chimpanzees likewise human. For RNA-seq analsis ,I am using Nature protocol.These RNA-seq data contains single reads. 1- For first step when I align these rna-seq reads to genome. I used this comman:"hisat2 -p 2 --dta -x indexes/hg.v90 project_datasets/SRR032126.fastq -S SRR032126.sam".I am not sure whether is true or not and as I said I have 2 replicates for each sex, Should I combine this replicate on one file? İf it is, how to perform it? Or why we use the two or more replicates. 2- I used the hisat2 >stringtie >... balgown for this analysis, but I want to do the differential analysis via DeSeq2. As you konow for this ,you need to count read. How to get this count reads from balgown. Thank you...

RNA-Seq • 1.5k views

ADD COMMENT • link updated 7.6 years ago by h.mon 35k • written 7.6 years ago by orhanbellur1117.ob • 0

0

Entering edit mode

I do not think it is a good idea to combine replicates before you do DESeq2 /edgeR analysis
Use featurecounts (or htseq to get counts from HISAT2 output. Or There are scripts to extract edgeR/DESeq2 input from HISAT2/StringTie workflow (https://ccb.jhu.edu/software/stringtie/dl/prepDE.py and https://raw.githubusercontent.com/griffithlab/rnaseq_tutorial/master/scripts/stringtie_expression_matrix.pl.
Before DESeq1 if you would like to merge, then for merging bams (output from HISAT2) can be merged using bamtools (example code below):

$ bamtools merge -in bam1 -in bam2 -in bam3 -out combined.bam &> job.log
You can also collapse/merge replicates in DESeq2 by using collapsereplicates functions in DESeq2.

btw,I don't think you can sue deseq2 :( (kidding)

ADD REPLY • link 7.6 years ago by cpad0112 21k

score 1 · Answer 1 · 2017-12-11

It seems you have many questions buried in an unformatted and confusing post, so I will chime in only about what I could understand.

My data is consist of 24 SRR file related to Sex-specific and lineage-specific alternative splicing in primates.

Is it really your data, or is this data from SRA / ENA / DDBJ you intend to analyse (nothing wrong with the second option, but then it is not your data, and you would know a lot more about it if it were your data). If it is published data, it is interesting to carefully read the paper.

Three males x2 replicates (For example: male1 rep1,male1 rep2 and so on.)

Are rep1 and rep2 technical or biological replicates?

If these are technical replicates, I would quickly check if the technical replicates are near identical (they should be), and if they were, then I would combine them. However, if these replicates have some biological variation attached to them (repeated measures over time, two different biopsies from same tissue, whatever), you should not combine, but incorporate this information into your model.

score 0 · Answer 2 · 2017-12-11

No, you shouldn't combine replicates. They should each be aligned independently and then the correlation between your replicates checked in subsequent analysis. The data from each replicate should be present - and annotated as such - when input into DESeq2.

To get the counts for DESeq2, you need to run the prepDE.py script provided by the authors of Stringtie. It's well documented on the page I've linked.