Question

difficulty in underestanding terms and graphs

0

Entering edit mode

8.9 years ago

DavidP ▴ 60

Hi everyone,

I just started my PhD in bioinformatics and specifically alternative splicing using RNA-seq, my background is computer science. I have two big issues at the moment:

Understanding specific graphs and results in papers
There are way many gene, isoform, function, cancer names ...

It's really hard to understand papers at the moment and I need to google many things ... I appreciate if you tell me how I can resolve those issues. I have some very basic questions like:

What is novel isoform?

What's the difference between novel transcripts and reference like transcripts?

cheers

DavidP

RNA-Seq • 3.9k views

ADD COMMENT • link updated 2.1 years ago by Ram 45k • written 8.9 years ago by DavidP ▴ 60

0

Entering edit mode

I think there is no 'one-spot' solution for this (It would be great if anyone provides one). Google the term you don't understand & do not hesitate to reach wikipedia. If you don't understand any context in any of the papers you are reading, post it here, people here are very helpful.

Some useful links

ADD REPLY • link 8.9 years ago by venu 7.1k

0

Entering edit mode

Do you have any biology background?

If not, you might need a crash course. Basic lectures about gene structure, textbooks etc.

ADD REPLY • link 8.9 years ago by jotan ★ 1.3k

1

Entering edit mode

8.9 years ago

DG 7.3k

Lots of people have given excellent answers and suggestions for online resources. I would also highly recommend taking some appropriate courses at your institution. Most PhD programs require at least a few courses for you to take, usually determined by your committee and department. If your background is in CS and you are now in a department or group that is primarily biology, you should take some appropriate courses. Or even just audit some courses. The biology of what is happening informs many of the design decisions of experiments, which carries on downstream into how those results are analysed. It's very common for people without biology backgrounds to analyse these sorts of datasets badly because they don't understand what is happening in those datasets.

ADD COMMENT • link 8.9 years ago by DG 7.3k

0

Entering edit mode

8.9 years ago

DavidP ▴ 60

I think I need to read more about Probability and Statistics, things like negative binomial (NB) distribution, generalized linear model (GLM) and Normalizing for composition biases. I need lots of ground work there, is there a good course or web resource?

ADD COMMENT • link 8.9 years ago by DavidP ▴ 60

0

Entering edit mode

There are a bunch of great courses on things like Coursera. I assume that nothing covers negative binomials, but anything going over GLMs (or even plain linear models) will suffice.

ADD REPLY • link 8.9 years ago by Devon Ryan 105k

0

Entering edit mode

Perhaps not what you had in mind, but you can find a lot of learning material/courses/introductions on youtube, e.g. https://www.youtube.com/results?search_query=negative+binomial+distribution

ADD REPLY • link 8.9 years ago by WouterDeCoster 48k

0

Entering edit mode

GLM GLM GLM: Look for a cheap copy of AR Dobson 'An introduction to GLMs' and just work through the whole thing. You should probably get a copy of Ewens and Grant (which is a brilliant book, but possibly a bit out-of-date now).

ADD REPLY • link 8.9 years ago by russhh 5.8k

0

Entering edit mode

8.8 years ago

DavidP ▴ 60

hey guys, which bioconducter you reckon for analyzing different types of alternative splicing?

ADD COMMENT • link 8.8 years ago by DavidP ▴ 60

0

Entering edit mode

You might be better starting a new topic, or searching for related topics

ADD REPLY • link 8.8 years ago by russhh 5.8k

score 2 · Accepted Answer · 2016-05-31

2

Entering edit mode

8.9 years ago

Devon Ryan 105k

"Googling stuff" is pretty much standard operating procedure for everyone starting something new :P

A "novel isoform" is a transcript for a gene that hadn't been previously reported. A "reference transcript" is a transcript that has enough evidence supporting its existence to end up in a reference database, which are typically maintained by large groups such as Ensembl or Gencode.

ADD COMMENT • link 8.9 years ago by Devon Ryan 105k

1

Entering edit mode

many thanks ... I use googling but some of the things are not well explained.

ADD REPLY • link 8.9 years ago by DavidP ▴ 60

score 2 · Accepted Answer · 2016-05-31

Simple genomics courses in Coursera & EdX would be a great help to you. Of course, googling always helps. But the idea is that once you know the basics of genomics, pop. genetics then some nomenclature would be clear to you and then you would need to google only some terminologies.

Some courses: 1. https://www.coursera.org/specializations/computational-biology
2. https://www.coursera.org/specializations/data-structures-algorithms

score 2 · Accepted Answer · 2016-05-31

As for question 2, check this Biostars post that gives you an updated list of courses in the field. The Ensembl help & documentation page, including our glossary could be a good place specially if you start using the Ensembl Gene Set, which is pretty thorough in annotating alternatively spliced transcripts, whether coding or otherwise. Once you have those concepts clearer, it may be easier to understand the graphs and results in the papers you read. You can also contact the authors of those papers you are not too sure about for further clarification. You can either email the corresponding author directly or better still tweet them. The latter will probably be more impactful as it's open and public to the community.

score 2 · Accepted Answer · 2016-06-01

2

Entering edit mode

8.9 years ago

DavidP ▴ 60

I was looking at a paper which compares some tools like MATS, they talk about factors like AUC and FDR. I know what they are but am confused how to relate it back to the tools!!!

ADD COMMENT • link 8.9 years ago by DavidP ▴ 60

1

Entering edit mode

Those are essentially QC metrics by which one can judge a tool's performance/reliability.

ADD REPLY • link 8.9 years ago by Devon Ryan 105k

score 1 · Accepted Answer · 2016-06-01

1

Entering edit mode

8.9 years ago

DavidP ▴ 60

thanks a lot guys.

I have another question. What is replicate data? paired or unpaired?

ADD COMMENT • link 8.9 years ago by DavidP ▴ 60

2

Entering edit mode

It's easiest to explain "replicate" with an example. Suppose you were doing an experiment with "healthy" and "sick" patients. Each healthy patient would be a replicate in the "healthy" group and each sick patient a replicate in the "sick" group.

Regarding paired and unpaired (aka, single-end) data, this refers to how each molecule is sequenced. Since you have a CS background, perhaps it'd be convenient to explain it as follows: "Given a string, S, an 'unpaired' (aka, single-end) read refers to a length N substring originating from one end of S. Paired-end reads would then be two length N substrings, one originating from each end of S."

The most common types of sequencing have fixed-length reads, that is N is constant across all reads. For paired-end reads, they're generated in such a way that you know which reads came from the same original molecule (S).

ADD REPLY • link 8.9 years ago by Devon Ryan 105k

1

Entering edit mode

Many thanks Devon ... very very helpful

ADD REPLY • link 8.9 years ago by DavidP ▴ 60

score 1 · Accepted Answer · 2016-06-07

1

Entering edit mode

8.9 years ago

DavidP ▴ 60

Many thanks everyone for all help so far.

I'm looking for some tools to first determine alternative splicing and then the event(s) such as: 1. Skipped exon (SE) 2. Alternate 3’ splice site (A3SS) 3. Alternate 5’ splice site (A5SS) 4. Mutually exclusive (MXE) 5. Intron retention (IR)

ADD COMMENT • link 8.9 years ago by DavidP ▴ 60