using GATK UnifiedGenotyper on a single chromosome
1
1
Entering edit mode
10.5 years ago

Hello

I am trying to run GATK UnifiedGenotyper on a single chromosome from a 30+ chromosome genome.

I have tried feeding GATK the bam file (containing reads for all chromosomes plus some scaffolds) and a fasta file of only the one chromosome I want variants for. However I receive the following error:

ERROR MESSAGE: Badly formed genome loc: Contig Scaffold143 given as location, but this contig isn't present in the Fasta sequence dictionary

Where Scaffold143 is one of the unassembled scaffolds.

I am assuming that the error comes from the bam file containing reads for that scaffold? How do I solve this? I somehow stumbled upon the -L argument for UnifiedGenotyper but I could not find any documentation for it on the GATK website. Is this an argument that defines the boundaries of a region on which to call variants? If so, this could be a way to work around the problem generated by feeding GATK only the fasta file of the chromosome I'm interested in.

genome java gatk • 4.8k views
ADD COMMENT
0
Entering edit mode

In case you want to call variants for one chromosome you can use "-L" or "Intervals of interest" parameter in GATK. You can give an input file that will have <chr>:<start>-<stop> for your chromosome of interest. This way GATK Unified genotyper will call variants for only that chromosome.

ADD REPLY
2
Entering edit mode
10.5 years ago

You are right. If the bam contains reads mapped to position that is not present in your (filtered) reference, GATK will complain. As you also noticed, you can use the -L parameter of GATK. The documentation is here.

Remember to provide as the reference the same file that was used for mapping (all chromosomes and scaffolds)

ADD COMMENT

Login before adding your answer.

Traffic: 4740 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6