I'm asking here before I go off half-cocked and try to build my own solution.
Simply put, I have been using the reference GRCh38, created BAM files via HiSAT2
, and then BCF files using samtools mpileup | bcftools call | bcftools norm
.
The BCF tools include the REF/REF matches, and the BCF file is for a single diploid organism (one human sample) ... so the BCF file contains only 0/0
, 0/1
, and 1/1
(no X|Y
, no 1|0
and no 1/0
or anything).
The next step is I want to create a new reference file (I assume .fastq is best) and incorporate the BCF file results into that reference.
Is there a tool/process already in place for something like this?
Thanks!
fastq is not a reference-thing
see New Fasta Sequence From Reference Fasta And Variant Calls File? ; GATK: vcf to fasta ; How to get individual chromosome sequence in fasta format from vcf.gz and its vcf.gz.tbi file of 1000 genome project? ; ...
Thanks!
I'm not sure what you mean when you say "fastq is not a reference-thing". I realize fastq is not "specifically" just for reference genomes. But I assume that it's the best file choice for a reference genome...in the sense that the original reference I used was a fastq run through HiSAT2 to make a BAM (which, I guess, IS the reference :D)
Am I misunderstanding what you're saying here (or something else about references :D)
Or maybe you mean I need a fastA, not a fastQ?
Yup fasta is used for reference genomes.
If I had the cred to promote this comment to an "answer", I would, and up-vote it.
Also see: