raw data information about genetic data?
2
0
Entering edit mode
8.6 years ago
a.hnekoo • 0

Hi,

I am going to use mRNA data of breast cancer. I am new to the field of bioinformatics. Stil I do not have access to the data but I need to get some information about this data type. such as:

  1. what is the estimated volume (storage space) of mRNA data for a patient from the tumor sample?
  2. What conditions should I consider to be able to estimate the volume?
  3. what is the format of this type of data? .xls???

I searched a lot to find the answer for these questions but was not able to find a good paper explaining these.

Thanks.

mRNA • 1.5k views
ADD COMMENT
0
Entering edit mode
8.6 years ago
DG 7.3k

1 and 2 are related, and it can be highly variable. The key factors are usually what type of sequencing is being done and the number of samples that were multiplexed on a given instrument or sequencing lane. Typically for most RNA-Seq experiments of these types you're looking at a few GB per sample for the compressed FastQ or BAM files (raw). You'll later of course generate more files for downstream analysis.

The raw data is in specialized text formats. The only thing you might put in spreadsheets is downstream analysed data.

ADD COMMENT
0
Entering edit mode
8.6 years ago
jeremy.cox.2 ▴ 130

A lot of historical breast cancer data is not shotgun reads, as explained well by Dan Gaston, but Affymetrix gene expression data.

If you have Affymetrix data, you will have a CSV file with subjects/cancer lines as columns and genes as rows.

These files are hundreds of MB in size.

ADD COMMENT
0
Entering edit mode

Good point, I should have asked for clarification on what sort of experimental data we were talking about. cDNA data is still pretty common and being gathered. I tend to us assume RNA-Seq these days. My own bias.

ADD REPLY

Login before adding your answer.

Traffic: 2618 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6