Question

Novaseq transcriptome data normalization

0

Entering edit mode

6.1 years ago

kanchaan36 • 0

I am new to NGS, I have transcriptome data (Novaseq) of two conditions i.e control and treated plant material. Some detail of my run is:

Lane Sample Barcode PF_Clusters %_of_lane %_Perfect_barcode Yield_(Mb) %_PF_Clusters % >= Q30_bases Mean_Quality
 2  T_1  ATGTCA  12,24,90,463   6.88   97.38    24,743   100    87.83   34.76
 2  N_2  CCGTCC  3,66,61,309    2.06   97.55    7,406    100    90.09   35.24

The data we got is not equal, i.e. T_1 (Treated) is having 24.7 GB data while N_2 (Normal) is having 7.4 GB of data which is 3:1. I need to normalize this data.

Thank you

RNA-Seq Novaseq Normalization • 1.1k views

ADD COMMENT • link updated 6.1 years ago by WouterDeCoster 48k • written 6.1 years ago by kanchaan36 • 0

0

Entering edit mode

What is the aim of the study? Differential expression, detection of isoforms, mutations, transcriptome assembly?

ADD REPLY • link 6.1 years ago by ATpoint 88k

0

Entering edit mode

Differential gene analysis and detection of isoforms.

ADD REPLY • link 6.1 years ago by kanchaan36 • 0

0

Entering edit mode

Purely as an academic interest: you may want to down-sample (not normalize) the data in this case so the dataset become equivalent. You can only downsample the larger dataset.

You could do so using reformat.sh from BBMap suite (look at sampling options) or seqtk sample.

ADD REPLY • link 6.1 years ago by GenoMax 151k

score 1 · Answer 1 · 2019-04-29

... you have only one sample of control and one of treatment? Or are there others you haven't shown?

Because if that's the case, there is no need to normalize because your experiment has already failed. A good comparison requires at least 3 samples per group.

If you have more samples then any proper differential expression algorithm (e.g. DESeq2, edgeR) would take care of this normalization and you don't have to do anything additional upfront. Nevertheless, it would be better to get the same amount of sequencing data in the future, and as such do more careful concentration determination and pooling of samples.