Entering edit mode
3.3 years ago
timothy.delory
▴
20
I have fastq files, where each file is a this sequence of a distinct haploid individual. I need to run these through GATK as though they were diploids, in order to use a software which takes only a VCF with diploid samples as an input. This post: How to merge two haploid samples (vcf, or g.vcf) into a pseudo-diploid? suggests merging bam files, and some post-merge processing, but couldn't I just merge pairs of my fastq's to get two haplotypes into one file? eg:
> cat read1_indv1.fq.gz read1_indv2.fq.gz > read1_combined.fq.gz
> cat read2_indv1.fq.gz read2_indv2.fq.gz > read2_combined.fq.gz
Then do all the sam bam GATK stuff after the fact?
Thanks
Are you absolutely sure this is necessary? What happens if you just put one fastq through the software? Will the software really refuse to run if it doesn't detect any mixed loci?
Based on my conversation with the author, merging to diploid individuals would be the easiest route.