Hello All,
I need some statistics for my analysis before writing my own code I would like to know whether an existent tool is capable of doing this. I have aligned several samples to the human genome and I would like to know some statistics:
1) I would like to know what percent of bases in each exon is covered, and for the definition of coverage i would like to give a user defined number say n, then a base in an exon is covered if there are at least n reads supporting. So I need a vector of number exons and percent covered.
2) Next I would like to know how many/percent exons of a gene is 100% covered, so this needs to be a vector of number of genes
3) For the remaining reads not aligning to exons, I would like to find continous regions of coverage i.e. a 200 bp region supported with at least n number of reads.
I guess with some programming this is doable but requires some effort, I would like to know whether I can get these statistics using existent tools such bedtools,gatk, picard etc. Thank you all for your suggestions
1) per exon coverage, refer to : Calculate Per Exon/Per Gene Coverage
Thanks the coverageBed function from the bedtools somehow does what i need but is there an option there to set a minimum number of reads aligned to a base for it to be included in percentage coverage?