Hello, I have a question about a reliable way to calculate RPM in the context of metagenomic sequencing for virus discovery. After sequencing my samples using a metatranscriptomic protocol (RNA+DNA), I would like to normalize the number of reads from each virus taxa as reads per million expressing relative abundance. I have done this before using the expression 'mapped reads/total clean reads x 10^6*'.
While doing this recently I started to wonder if it would not be better to use 'mapped reads/total clean & deduplicated reads x 10^6'. (Deduplication removal is ~50 to 60% of initial clean read count).
Any thoughts on this would be appreciated.