Entering edit mode
3.4 years ago
ariel
▴
250
I'm trying to create something like a GVCF file for the human genome, but without the variants. It would be like:
CHROM POS BASE
I figured one way to start would be to align hg19 to itself to get a SAM file. But not sure where to go from there.
I'm guessing there might be a GVCF file out there somewhere from some project like COSMIC or 1000 Genomes. But since those are focused on variants, will they have ALL bases included?
This looks like an XY problem. May I ask why you're creating this file?
It is highly possible.
I have some VCFs for patients which were made from multiple panels (XGen, Illumina). These are VCF and not GVCF, so I only have the reference bases at coordinates where variants were found.
I calculated the GC ratio for the reference bases and alternate bases.
I'd also like to calculate the GC ratio for the entire regions covered by the panels (I have the bedfiles). Since the VCF files do not have the bases for all positions, I want to make a list using the reference genome.
So you'd like GC ratio for the reference genome, correct? That should already be available as a resource IMO. If not, google "calculate GC ratio hg38" or something like that.
Sounds like a job for GATK, check out mutect2
Please elaborate on your "answer" - for the moment, it is just a comment as it does not really answer the top level question.