normalization of NGS results
1
0
Entering edit mode
9.8 years ago
agata88 ▴ 870

Hi,

I am dealing now with the simple NGS results containing 19 samples with different counts of reads for each sample, for example the smallest library has 114793 reads and the biggest 242798 reads. Each library present 53 amplicons (2 genes).

Do I have to normalize those reads? Can I use RPKM/FPKM? and if yes, how can I do that?

Next step of my analysis will include calculation of some stats like coverage per amplicon for every sample and I think the results won’t be correct without previous normalization of reads, am I right?

I would appreciate for any help.

Best regards,

Agata

next-gen normalization dna-seq • 4.4k views
ADD COMMENT
4
Entering edit mode

We need to know the biological question you're trying to answer to provide useful feedback.

ADD REPLY
0
Entering edit mode

What do you mean by "biological question"? Is my post unclear?

ADD REPLY
0
Entering edit mode

Yes, I am dealing with DNA not RNA.

I would like to do more complex stats like: list the amplicons covered at least 50X in all 19 samples.

For example: one amplicon from sample 1 has a mean coverage 30X, and the total read count for this sample is 30 000, in the sample 2, the same amplicon have 50X coverage with total read count 40 000. I think I cannot compare coverage between those two, right? But if I do some normalization and the total read count will be the same for all samples, the comparison will be OK.

But I don't have any idea how to do that.

Hope I made it clear.

ADD REPLY
3
Entering edit mode

Again, whether you need to normalize or not depends on the conclusions that you want to draw from these sorts of summary statistics. There's typically no need to normalize for total number of reads/sample when calculating coverage, at least unless you need to do some differential comparisons (given that this is amplicon data it's highly questionable if any sort of differential coverage comparison would even be meaningful).

ADD REPLY
2
Entering edit mode

Exactly. Unless you are trying to say, detect copy number variation or something you won't need to normalize. And I would strongly encourage you to NOT attempt anything like that with amplicon data. Coverage stats and simple things like that are essentially qualitative measures of your data used for summarizing how well your experiment went and informing you of any potential gaps in your sequencing (for instance to see if you may have false negatives in a given region of sequencing when looking for variants because of lack of sufficient coverage depth)

ADD REPLY
2
Entering edit mode
9.8 years ago
DG 7.3k

As Devon said in his comment we really need to know more to be sure, particularly the biological context. That said it sounds like you sequenced genomic DNA and not RNA, is that correct? If so, and you are looking at simple stats like coverage, depth per sample, etc then no, you don't need to do any sort of normalization between the samples. If you intend on doing any more sort of complex analysis or started with RNA, etc then this might be a different answer.

ADD COMMENT

Login before adding your answer.

Traffic: 1662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6