I would like to:
1: Define all regions in a bam file with coverage above 4 (no gene annotation available and is exome capture experiment)
i.e.
chrom1 start stop 1500bp region1
chrom1 start stop 500bp region2
chrom1 start stop 1000bp region3
2: Count the number of snps in each of those regions separately so a snp count for each region of coverage
i.e.
chrom1 start stop 1500bp region1 10snps
chrom1 start stop 500bp region2 3snps
chrom1 start stop 1000bp region3 20snps
I'm trying bam2bed to generate a bed file but not sure if this will be based upon coverage?
Then thought I would try samtools mpileup and write something to traverse the mpileup using the regions defined in 1.
Any suggestions on how to do this better would be appreciated, seems like this would have been done before and I may be doing it a complicated way?
This is what I have so far, not sure if going down the right road
bedtools genomecov -ibam 1.bam -bg > 1.bam.bedgraph
awk '$4 > 3' 1.bam.bedgraph > my.covered.intevals.bedgraph
Output:
IWGSC_3BSEQ_3B_traes3bPseudomoleculeV1 5700 5769 4
IWGSC_3BSEQ_3B_traes3bPseudomoleculeV1 24988 24995 4
IWGSC_3BSEQ_3B_traes3bPseudomoleculeV1 25003 25005 4
IWGSC_3BSEQ_3B_traes3bPseudomoleculeV1 25005 25022 5
IWGSC_3BSEQ_3B_traes3bPseudomoleculeV1 25022 25034 6
IWGSC_3BSEQ_3B_traes3bPseudomoleculeV1 25034 25041 7
IWGSC_3BSEQ_3B_traes3bPseudomoleculeV1 25041 25047 8
IWGSC_3BSEQ_3B_traes3bPseudomoleculeV1 25047 25050 9
IWGSC_3BSEQ_3B_traes3bPseudomoleculeV1 25050 25054 10
IWGSC_3BSEQ_3B_traes3bPseudomoleculeV1 25054 25055 11
Looks like may have to write something to add the coverage as seems to split some regions?
Maybe use the per base coverage count and manually merge the count of a particular region of interest? You might try using something like https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_coverage_DepthOfCoverage.php