Hi,
I'm trying to call variants from human RNA-seq data with GATK.
I first run the SplitNCigarReads step using this command:
java -jar GenomeAnalysisTK.jar -T SplitNCigarReads -R GRCh37.fasta -I <bam_file> -o <splitN_bam_file> -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS
Then, I'm trying to realign around indels using this command:
java -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R GRCh37.fasta -I <splitN_bam_file> -L 20 -o <realignment_targets_list_file>
which exits with this error:
##### ERROR MESSAGE: Badly formed genome loc: Contig '20' does not match any contig in the GATK sequence dictionary derived from the reference; are you sure you are using the correct reference fasta file?
In the output of the SplitNCigarReads all chromosomes in the fasta file are reported and a contig named '20' is clearly not there. The same is true for the dictionary created from the the fasta file using picard's CreateSequenceDictionary module, and in the fai file created by the samtools faidx option, which are required for indexing the genome fasta file.
Any idea?