Hi, I have run a Hiseq 2x150bp from metagenomics samples.
I have computed the gene coverage for a list of genes for a given sample. For each sample i take a subset of genes i´m interested in and mapped all reads to each of these genes generating a bam file that I use as input to bedtools as follows
bedtools genomecov -ibam subsetgenes_sample_A.bam
bedtools genomecov -ibam subsetgenes_sample_B.bam
bedtools genomecov -ibam subsetgenes_sample_C.bam
....
Here below the output for one sample. I have added column names for clarity.
V1 V2 V3 V4 V5
k141_25_1 1 64 165 0.387879
k141_25_1 2 100 165 0.606061
k141_25_1 3 1 165 0.00606061
k141_34_1 2 11 726 0.0151515
k141_34_1 3 55 726 0.0757576
k141_34_1 4 36 726 0.0495868
k141_34_1 5 101 726 0.139118
k141_34_1 6 123 726 0.169421
k141_34_1 7 195 726 0.268595
In order to compute gene coverage i run coverage = (V2*V3)/V4.
The problem i have is that i have several samples so ideally i should normalize each sample by the total number of reads and multiply by 1M.
Would it make sense to normalize as follows normalized_genes = coverage(from above formula)/totalReadspersample * 1000000
Thanks for comments.
Thanks for your comments Carlo, Can you generate coverage using FeatureCounts from bam files ??
Ok found it coverageCounts. Great I´ll normalize with Deseq2 , thanks
Hi, Couldn´t manage to get genes coverage using coverageCounts. It returns binary files. How would you get the gene coverage as numeric ?
You can get the raw counts for your set of specific genes when you have Bam or SAM files. Later you can load the raw counts inot R and normalize using DESeq2 or EdgeR