Question

Best Practices For Publication Of mRNA-Seq Data

4

Entering edit mode

12.9 years ago

Dave Bridges ★ 1.4k

Unless I am mistaken, there does not appear to be a stable repository for archiving RNAseq data, akin to how GEO and ArrayExpress archive microarray data. We are preparing to publish some mRNASeq results and I would like some input as to the best ways to both present and archive the raw and processed data. Currently the thinking (aside from GSEA in the paper) is to include the cufflinks processed data as a supplementary table. The main questions I have are:

How to make the raw short-reads available?
Is it worthwhile to make the aligned (bam) files available and if so, how?
Other than software, software versions, non-default parameters and hardware information, what technical data should be provided in the manuscript
If I am going to archive the short reads or alignment files, what is the best way to attach the relevant metadata about the samples?

publication rna-seq next-gen • 5.4k views

ADD COMMENT • link updated 7.8 years ago by h.mon 35k • written 12.9 years ago by Dave Bridges ★ 1.4k

Istvan Albert · Answer 1 · 2012-10-14

8

Entering edit mode

12.9 years ago

Mikael Huss 4.8k

I have submitted RNA-seq data to GEO (http://www.ncbi.nlm.nih.gov/geo/info/seq.html) and ArrayExpress (http://www.ebi.ac.uk/microarray/doc/help/UHTS_submissions.html). They have pretty detailed standards for how to submit, including how to set up the metadata, so I think you should start there.

Whether to provide BAM files or not is a matter of taste, I think. Obviously whether people will use them depends on how much they trust the aligner you've used. I have personally used BAM files from GEO because I didn't think it was worth the effort to remap them.

I think the same goes for providing Cufflinks results. Of course it is helpful in relation to your article, because people will find it necessary if they want to reproduce the results in your paper. If they want to re-analyze your data, they will probably run their own tools on them.

GEO and ArrayExpress will, as far as I recall, allow you to upload BAM files and processed data tables (like Cufflinks output) together with the FASTQ files.

So in short, to your questions:

Upload to ArrayExpress or GEO
It might be - upload those as well to the same place
If you are doing differential expression analysis, explain in detail how that was done. Can't think of anything else right now
See instructions from ArrayExpress or GEO

ADD COMMENT • link updated 12.9 years ago by Istvan Albert 103k • written 12.9 years ago by Mikael Huss 4.8k

2

Entering edit mode

all true. Just "Alignment files (e.g. BAM, SAM) should not be supplied as processed data files." http://www.ncbi.nlm.nih.gov/geo/info/seq.html

I think they think its too much.

ADD REPLY • link 12.9 years ago by Ido Tamir 5.2k

1

Entering edit mode

Any sequence files submitted to GEO end up in SRA with a link.

ADD REPLY • link 12.1 years ago by Sean Davis 27k

0

Entering edit mode

Ah, I missed that. Thanks. Still, I'm positive I have used alignment files from GEO before (may have been Eland output rather than BAM/SAM) so maybe they used to allow it but feel it's too much now (which is understandable).

ADD REPLY • link 12.9 years ago by Mikael Huss 4.8k

0

Entering edit mode

I would also add quantified data (isoform/genes) - makes it a lot easier for the rest of us to quickly test hypothesis on your data.

ADD REPLY • link 7.8 years ago by Kristoffer Vitting-Seerup ★ 4.2k