Hello All,
I would like to calculate the 'callable genome'. I have a reference genome in fasta format and a bed file with a list of genome coordinates that were excluded from variant calling. Could anybody advise on how to combine these two files to calculate how many sites in the reference genome were NOT excluded from analyses. So number of bases in the fasta file minus number of bases in regions listed in the bed file? I have found some solutions that would involved converting into bigWig format but I was wondering if there is a simpler tool for this.
Thanks in advance for your help.
Example of bed file regions:
contig1 1588 1589
contig1 3424 3428
contig2 0 401
As an alternative to
seqtk
could I also use:grep -v ">" file.fasta | wc | awk '{print $3-$1}'