Question

Queries regarding RNASeq sequencing

0

Entering edit mode

5.5 years ago

glady ▴ 320

Hello all,

I have some very simple questions regarding Human transcriptome sequencing:

1) Which library size would be better for sequencing: 1x150 or 2x75?

2) What can I consider as a good depth for performing the transcriptomic analysis: 20M, 30M or >30M?

3) Can we perform the transcriptomic study only on Tumor samples without considering the Normal samples?

The goal of the project is to use the multi-omic approach, to analyze more than 50 tumor samples for transcriptome and exome study. Since the dataset is going be big, my question was can we do this without sequencing the normal samples for transcriptome. And what might be a good library(single-end/paired-end) and depth(20M or >30M) to perform this task.

Thank you in advance.

RNA-Seq • 1.7k views

ADD COMMENT • link updated 5.5 years ago by Kristoffer Vitting-Seerup ★ 4.1k • written 5.5 years ago by glady ▴ 320

1

Entering edit mode

The answer to all these questions depends on what genome you are using. Whether you are interesting in splicing variation and what the biological question being addressed is.

ADD REPLY • link 5.5 years ago by i.sudbery 21k

0

Entering edit mode

Very likely human since normal/tumor are being referenced.

ADD REPLY • link 5.5 years ago by GenoMax 150k

1

Entering edit mode

yes, tell us your ultimate goal and the organism.

ADD REPLY • link 5.5 years ago by grant.hovhannisyan ★ 2.6k

score 0 · Answer 1 · 2019-10-29

0

Entering edit mode

5.5 years ago

JC 13k

1) if you have a well-defined transcriptome, it doesn't matter which one you use, if you want to find new isoforms, prefer paired-ends 2) how many genes/transcripts do you have? this is again how much information you have for your genome/transcriptome 3) maybe

ADD COMMENT • link 5.5 years ago by JC 13k

1

Entering edit mode

3 depends more than most on the question being asked. TCGA has very few or no normal samples and is still very useful for asking some questions, just not for the question "which genes are upregulated in cancer?".

ADD REPLY • link 5.5 years ago by i.sudbery 21k

0

Entering edit mode

How can I co-relate my expression dataset with TCGA, to identify the enriched pathways or genes? Are there any R packages which I can used to do so? Or any other tutorials?

ADD REPLY • link 5.5 years ago by glady ▴ 320

0

Entering edit mode

Unfortunately not. Its not usually possible to combine the results of different studies into a single analysis. Was just using TCGA as an example of where useful information can be provided by a data set that does not have normals.

Really, we need to know what biological question you are attempting to answer to know what the right way to design and analyse the experiment is.

ADD REPLY • link 5.5 years ago by i.sudbery 21k

0

Entering edit mode

I agree on this. 3) is well possible without normals if you are interested in clustering or classifying samples based on the relative expression in each patient within a cohort or if you are interested in finding co-expression networks.

ADD REPLY • link 5.5 years ago by ATpoint 87k

0

Entering edit mode

I'm sorry but can you provide me some more details on this point. How can I cluster the samples on their expression? Are there any kinds of literature where they have performed similar work?

ADD REPLY • link 5.5 years ago by glady ▴ 320

0

Entering edit mode

Hierarchical clustering based on Z-scored expression values for a selection of genes is an option. Check any NGS paper where people analyzed cohorts of samples / patients. A clustering step (or any kind of complexity reduction step) is typically among the first figures.

ADD REPLY • link 5.5 years ago by ATpoint 87k

0

Entering edit mode

Thank you for your useful suggestions. Can you share some links with me about such literatures, it would be really helpful.

ADD REPLY • link 5.5 years ago by glady ▴ 320

score 0 · Answer 2 · 2019-10-30

With regards to question 1 I will have to disagree with @JC. Even if you have a well defined transcriptome paired end is preferable since it give more accurate results (even for gene level analysis). You can read more about that and suggested tools here.

For question 2 it depends on what you want to do with the data. If you are only interested in gene-level analysis you can get away with less depth than if you want to do transcript level analysis. Just remember that a lot happens at transcript level in cancer - see e.g. this and this paper. Such analysis can be done with IsoformSwitchAnalyzeR and an example (from TCGA data) can be found here.

Like mentioned in the comments the answer to question 3 depends on what you want to do with the data and what to goals of the project is?