News:Survey: Rna-Seq Analysis For Differential Gene/Transcript Expression [Updated With Results]
3
11
Entering edit mode
12.1 years ago
bodhisattvax ▴ 250

Hi all,

I am looking to build a 'standard' RNA-seq data analysis pipeline for analysing differential gene and possibly transcript expression.

I am aware that there are a variety of tools out there for the various steps (alignment, counting, differential expression), each with their respective pros and cons, cheerleaders and dissers.

So I have created a (short) survey which I think could be useful to all of us, to try and see if we are moving towards some consensus about the preferred methodology for each of the steps.

The survey is at http://www.surveymonkey.com/s/72953N9. I would be very grateful if you could fill it out: it should only take a few minutes of your time.

You may prefer to respond within this thread itself but being an optimistic soul, I'm hoping that I get so many responses that I will need to use the results analysis tools on survey monkey!

Of course, I will make the results available either here or on request.

Thanks in advance

transcript RNA-seq gene • 8.1k views
ADD COMMENT
0
Entering edit mode

I filled the survey, please let us know the results.

ADD REPLY
0
Entering edit mode

Thanks very much for responding! I will make the results available soon - still waiting for a few more responses. Thanks again

ADD REPLY
0
Entering edit mode

Very cool idea. I'm excited to see the results.

ADD REPLY
0
Entering edit mode

Thanks DeeDee! See my answer below - I'll be making the results available soon

ADD REPLY
10
Entering edit mode
12.0 years ago
bodhisattvax ▴ 250

Hi all I've finally put together the results of the survey! First of all, thanks to everyone who participated - the response has been great, with 93 people completing the survey as of today.

The respondents have been a varied bunch, including all levels of academia (pre-docs, grad-students, pot-docs and PIs), core bioinformaticians and bioinformatics managers, as well as many from the industry. The majority of respondents appear to be based in the US and Europe but also in China, Korea and Australia.

I provide below my own summary of the survey's findings, and I have a document which contains all the results, including all unedited comments. I'm not sure how I can upload this file on this site. If you would like it, please either check my post on seqanswers where I have been able to upload the file, or get in touch with me so I can email it to you. Biostars admins can you help here?

As with any survey, we should probably be aware of potential biases (e.g. skews caused by people who are really annoyed with a particular tool!). My inferences below are probably influenced by my own experiences, so feel free to rap my knuckles if you feel I am over-reaching my inferences or misinterpreted the data, and to air your doubts about the veracity and accuracy of the results and conclusions. I'd also like to declare here that I have no vested interests, have nothing to gain by promoting one tool over another, and have only used a small number of all the tools listed.

Now for the summary. Enjoy!

One of the take-home messages from the survey appears to be that the shadow of the Tuxedo Suite still looms large over the RNA-Seq analysis field. However there is a wide diversity of opinions and experiences, and many other tools appear to be in the ascendancy, especially when it comes to read-counting and differential expression analysis.

Q1. What do you prefer to align your reads to?

Most respondents align to the genome only (47.3%) , and this is closely followed by those who align to both genome and transcriptome (39.8%). Key to their choices has been the availability and reliability of data, as well as the question being asked in the experiment. Respondents who chose to align to the genome only appear to do so for various reasons such as the ability to discover new transcripts and splice variants. However many respondents have commented that aligning to both the genome and transcriptome offers several advantages, such as increased speed and accuracy. Thus , for a species, if both a reliable genome and transcriptome are available, this might be the optimal way forward.

Q2 and 3. What is your preferred aligner? And the reasons why.

Tophat rules the roost here, taking more than two-thirds of the vote (67.9%). Reasons for this include its ease of use, proven accuracy (which has improved over time), historical popularity, and that the alternatives available have not yet warranted a change from Tophat. Another Tuxedo suite aligner, Bowtie, comes in at a distant second (17.3%). STAR (6.2%) has been noted for its speed.

Q4 and 5. What is your preferred read-counting methodology? And the reasons why.

Again, a Tuxedo suite tool, Cufflinks, took the majority of votes (57.1%). Reasons for this included its ease of use but many respondents appear to use this because it has been logical follow-on from using Tophat as per the Tuxedo workflow. The second-placed HTSeq-count appears to be in the ascendancy - many respondents appear to have been dissatisfied with Cufflinks and switched to HTSeq-count. This looks to be a good candidate to topple Cufflinks from the top in the near future. Other notable tools include easyRNASeq and RSEM. Also, many respondents use bedtools, samtools or in-house tools and custom scripts.

Q6 and 7. What is your preferred methodology to estimate differential expression? And the reasons why.

Finally, a non-Tuxedo suite tool wins the vote: DESeq/DEXSeq with 44.7%. CuffDiff is not too far behind (35.5%) and EdgeR (19.7%) brings up the rear. Going by the comments , we might expect usage of DESeq and EdgeR to increase as opposed to CuffDiff. Results from the latter have been variously described as weird, untrustworthy, having too many false positives and other problems.

Q8. Which annotation resource do you use?

Ensembl (46.6%) was the clear winner. Second and third places were closely contested between Refseq (25.9%) and UCSC(22.4%) respectively.

Q9. What software do you use for downstream analyses?

GOSeq (68.9%) is clearly very widely used. Many respondents also use the commercial options of Ingenuity IPA and Genego Metacore. DAVID was also an honourable mention.

P.S. Please note: the percentages quoted relate to the numbers of people who answered that particular question. This varies widely across questions, from all 93 respondents in the first question, to 45 for Q9

ADD COMMENT
0
Entering edit mode
12.0 years ago
bodhisattvax ▴ 250

A quick update: Again, thanks for all the responses so far.

I think I'm pretty satisfied with the number of responses and will start to collate the results and generate a report which I will share with everyone.

This is more non-trivial than I had initially thought as there doesn't seem to be an easy way to get the responses off SurveyMonkey without paying them for it. But I hope to have all this done over the next 2-3 days.

Meanwhile, if anyone else would like to complete the survey please feel free to do so! Cheers

ADD COMMENT
0
Entering edit mode
12.0 years ago
bodhisattvax ▴ 250

Also, here is the seqanswers post where I could attached the file containing all the results http://seqanswers.com/forums/showthread.php?t=25296

ADD COMMENT

Login before adding your answer.

Traffic: 2437 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6