Combining GTEx and TCGA data
1
0
Entering edit mode
2.9 years ago

Hey,

I want to analyze Differential Transcript Usage (DTU) for TCGA isoform/transcript expression data while taking GTEx as "normal" reference. In other words, TCGA "tumor" and GTEx "normal" samples will be compared.

In order to remove variation caused by batch effects, I have to perform some batch correction. Here is a subset of my design matrix:

X sample condition batch

  1. s1 tumor TCGA
  2. s2 tumor TCGA
  3. s3 tumor TCGA
  4. s4 normal GTEx
  5. s5 normal GTEx
  6. s6 normal GTEx

Following error is received when trying to run DEXSeq: "The supplied design matrix will result in a model matrix that is not full rank"

I know that the error is received due to redundancy in my design matrix. But I want help in tackling this issue? Is there any way for me to modify my design matrix so as to avoid this error? Can I use TCGA and GTEx data without batch correction?

Your help will be much appreciated.

TCGA GTEx RNA-Seq • 2.1k views
ADD COMMENT
0
Entering edit mode

why not just use the normal TCGA sample from the same individual?

ADD REPLY
0
Entering edit mode

Firstly, the samples are that of "adjacent" tissues (of cancer patients) and not exactly "normal". Secondly, I want to see the differences at 3 levels (NAT, normal, tumor).

ADD REPLY
2
Entering edit mode
2.9 years ago
ATpoint 85k

Batch is fully confounded with treatment, nothing you can do about it. This has been asked dozens of times before, please search google for related threads.

ADD COMMENT
0
Entering edit mode

Thanks for your reply.

I know that batch effect can lead to masking/misinterpretation of biological effect. But this is DTU we're talking about. As DTU works on proportions (isoform fractions), so in my opinion, I can go for comparison of TCGA and GTEx data without correcting batch effects.

Some expert's opinion will be appreciated. Thanks.

ADD REPLY
1
Entering edit mode

It is still counts that you're comparing. There is no guarantee that the proportions can be directly compared. Any difference you see can be purely technical. On the other hand, if you are absolutely forced to use these data then go ahead but interpret results with care and validate important findings. In any case you cannot correct any batch effects, there is no magic that will do that on a nested design like this.

ADD REPLY

Login before adding your answer.

Traffic: 2230 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6