combine two RNA-seq files together
1
1
Entering edit mode
5.4 years ago
yueli7 ▴ 250

Hello,

I have two RNA_seq data, one is downloaded from cancer sample: https://tcga.xenahubs.net/download/TCGA.OV.sampleMap/HiSeqV2_percentile.gz,
Rank genes RSEM values between 0% to 100%

The other is normal tisssue: GTEx_Analysis_2016-01-15_v7_RNASeQCv1.1.8_gene_tpm.gct.gz Gene TPMs.

My question is how I can combine two file together, and try to find the differentially expressed genes in cancer and normal samples?

Thanks in advance for any help!

Best,

Yue

RNA-Seq • 1.8k views
ADD COMMENT
0
Entering edit mode

Hello, everyone,

I found the normal and cancer data in one dataset in GEO.

Thanks for any help!

Yue

ADD REPLY
4
Entering edit mode
5.4 years ago
ATpoint 85k

You cannot simply download two completely unrelated datasets and then perform differential analysis. There are almost certainly technical confounders (batch effects) that will dominate (=create false results) your results. One can only compare samples from the same lab, same protocol, same study. Everything else will almost certainly contain a large number of false-positives/negatives. Please read about RNA-seq analysis first, e.g. https://peerj.com/preprints/27283/ and https://www.bioconductor.org/packages/devel/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html.

You need raw counts (non-normalized) to perform meaningul statistical analysis and biological/experimental replicates. One cannot simply take any counts downloaded from a database. Read the linked articles carefully and then reconsider your strategy.


Look (and I really mean no offence at all) but this post and the ones about ChIP-seq analysis that you posted recently imply that you are an beginner in the field. You need more background before start analyzing data. Without proper background knowledge and some experience your analysis will almost certainly be flawed and therefore meaningless. Bioinformatics is quite a difficult field because there are very few standards and a lot of pitfalls. If you can, please take a course with an experienced supervisor. In any case, read as much as you can in online tutorials and blogs. Try to understand how things work and most importantly: Use established tools and workflows. Don't create custom analysis strategies before you gain a very good understanding of what you are doing. Again, I mean absolutely no offence, I just try to save you from beginners mistakes that might cost you a lot of time while basically producing no output.

ADD COMMENT
0
Entering edit mode

Hello, ATpoint,

Thank you for your response!

I'd better to compare in one dataset.

But my boss want to me, compare the normal and cancer sample.

Actually, there is not many normal sample in TCGA.

I have to process the data from fastq?

Thank you again!

Yue

ADD REPLY

Login before adding your answer.

Traffic: 2332 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6