Question

Tissue-specific DEG analysis with DEseq2

0

Entering edit mode

6 months ago

M. ▴ 40

Hi all,

I hope you're good. I want to make a tissue-specific expression analysis using multiple tissues. I have lots of raw FastQ files that I retrieved from the ENCODE database. After successfully completing the alignment and quantification steps, I need to come up with consensus data for each tissue. I don't know how to do this yet, but I want to ask about a further step in the analysis. When comparing tissues in DEseq2, normally we use a control group as a reference. But in this case, I don't have any baseline reference with which I can compare all tissues. Can I still use DEseq2 for this purpose? If its possible, how exactly? If not, is there any other method you can suggest for me? I'm new to the topic and a little bit confused about it. I would appreciate it if you could help.

Thanks in advance!

DEseq2 RNA-seq DEG R • 983 views

ADD COMMENT • link updated 6 months ago by arctic ▴ 40 • written 6 months ago by M. ▴ 40

0

Entering edit mode

6 months ago

arctic ▴ 40

Not my area of expertise but GTEx portal seems to offer visual comparison of expression across tissues, maybe checking their pipeline could provide some general insights?

https://gtexportal.org/home/tissue/Brain_Cortex

https://gtexportal.org/home/methods

And they seem to share their pipeline on github:

https://github.com/broadinstitute/gtex-pipeline/tree/master/rnaseq

However, based on the methods they also seem to have used same sequencing pipeline/platform across samples (Consistent with ATpoint's comment)

ADD COMMENT • link 6 months ago by arctic ▴ 40

score 2 · Accepted Answer · 2024-05-06

2

Entering edit mode

6 months ago

ATpoint 85k

I assume that all data are from the same experiment, without batch effects and can thus be quantitiatively compared:

Many ways to do this, I will come up with some ideas:

1) Test each tissue versus the average of all other tissues. This is comparably simple to implement, and with reasonable cutoffs you might enrich for "tissue-specific" genes. See for example: https://support.bioconductor.org/p/91823/

2) Test all vs all, meaning all possible pairwise comparisons, and then aggregate the results. You could use a strategy implemented in https://rdrr.io/bioc/scran/man/combineMarkers.html for this, or use some sort of meta-analysis, for example RobustRankAggregation https://cran.r-project.org/web/packages/RobustRankAggreg/index.html to find genes that are consistently overexpressed in your tissue versus all other tissues.

3) Use either of 1 or 2, collect DEGs and subject them to hclust+heatmap, and then select clusters that visually appear to separate the tissues.

ADD COMMENT • link 6 months ago by ATpoint 85k

0

Entering edit mode

Can I ask what do you exactly mean by same experiment? If you are talking about the same donor and the same procedure, unfortunately, not all the data comes from the same experiment. There are two different labs that produced these data. One is only produced single-ended, and the other is only paired-ended. There are different donors for the experiments in each lab, and the experiment IDs are different for most of the fastq files. So, they are the products of different sequencing runs. What could be done in these circumstances? Can I treat them as replicates, or should I do a filtering?

ADD REPLY • link 6 months ago by M. ▴ 40

1

Entering edit mode

Sounds like full confounding. Be careful. Personally I would not do this comparison. There is no way to distinguish tissue effect from donor effect and batch effect. I know it is tempting, but one cannot randomly collect data and pretend it was from one experiment. RNA-seq is a relative measure, and baselines are just different.

ADD REPLY • link 6 months ago by ATpoint 85k

0

Entering edit mode

I wrote an R package (see rrdr.io, and github) some years ago to calculate tissue specificity from bulk RNAseq based on this paper. Bear in mind this was before scRNA-seq became so popular. While I did not use deseq2 as part of the pipeline, it may give you some ideas and code snippets of how to proceed as I also included ENCODE data (23 tissues).

Unfortunately, I put the cart before the horse. The pipeline assumes all samples are comparable. I have confidence in the robustness of the pipeline, but I have yet to figure out how to ensure the count data is comparable across so many samples from different experiments. This is a massive batch effect problem.

The main problem is as ATpoint describes, you can be absolutely sure there will be confounding affects across the experiments. I'm still not even sure how to tackle this. Maybe if you can somehow find a dataset from 1 lab, with multiple tissues, and the metadata describing the data collection (i.e. when, who, how). The more of this kind of detail you can add to the metadata for every tissue, the more chance you have of finding batch effects and creating something awesome.

ADD REPLY • link 6 months ago by BioinfGuru ★ 2.1k