Hello
I am trying to run GATK UnifiedGenotyper on a single chromosome from a 30+ chromosome genome.
I have tried feeding GATK the bam file (containing reads for all chromosomes plus some scaffolds) and a fasta file of only the one chromosome I want variants for. However I receive the following error:
ERROR MESSAGE: Badly formed genome loc: Contig Scaffold143 given as location, but this contig isn't present in the Fasta sequence dictionary
Where Scaffold143 is one of the unassembled scaffolds.
I am assuming that the error comes from the bam file containing reads for that scaffold? How do I solve this? I somehow stumbled upon the -L
argument for UnifiedGenotyper but I could not find any documentation for it on the GATK website. Is this an argument that defines the boundaries of a region on which to call variants? If so, this could be a way to work around the problem generated by feeding GATK only the fasta file of the chromosome I'm interested in.
In case you want to call variants for one chromosome you can use "-L" or "Intervals of interest" parameter in GATK. You can give an input file that will have
<chr>:<start>-<stop>
for your chromosome of interest. This way GATK Unified genotyper will call variants for only that chromosome.