Thank you all for your helpful suggestions.
I revisited the thread referred to by Ashutosh and was able to assemble my version of the hg19 that GATK did not complain about!!!
Here is what I did.
1. I extracted the list of contigs in the same order from hg19.fai that I downloaded from GATK resource bundle (hg19 liftover).
2. I tweaked the list to move the ChrM after ChrY in the list of contigs. This is exactly how my BAM files were aligned to the reference genome used by lifescope.
3. Downloaded the chromFa.tar.gz from uscs for hg19 and uncompressed the individual contigs.
4. Ran the following shell script to read from my contig list and assemble the individual chromosome fasta files from step 3.
#!/bin/bash
touch hg19.fa
FILE=$1
while read line
do
echo "Appending $line.fa..."
filename=$line.fa
cat $filename >> hg19.fa
done < gatk_ucsc_contig_order.txt
5. Indexed this new genome using bwa, created the dictionary and fa index.
6. GATK was happy!!!! :-)
The downside to this although is I am unable to use the known hg19 indel/snp vcfs from GATK. Working on that.
Cheers!
UPDATE (4/29/15) I was able to use VCFsorter to rearrange the contig ordering in the vcf files from gatk resource bundle to match the hg19 reference (above) and subsequently index the vcf files using IGVtools index command. This enabled me to use the VCFs as known sites for indel realignment and BQSR in GATK.
Having said all this, the best option, as biocyberman mentioned, is to use the b37 reference from GATK resource bundle for the primary alignment process. Most downstream steps are taken care of.
Thank you all once again for helping me out.
I had success using bfast to align solid data. So that's what I would recommend for you :)
Hi Raony,
I do not have much experience with bfast, but will give it a try. The PDF man pages appear a bit dated (2011). If there is a more updated manual, do you mind sharing the link?
Another question.. Can bfast be used to realign BAM files generated by Lifescope or other SOLiD aligners and realign them to a different human reference genome?
Thank you!
I thought you figured out the header problem. If your are looking forward to realign the solid reads you better give a try to SHRiMP2. Unfortunately they are no longer developing or supporting it. But I have used it extensively to align solid reads and it has worked pretty good for me.
Yes the manual is a bit old, but I don't know about any new version.
Try with this branch: http://sourceforge.net/projects/bfast/files/bfast%2Bbwa/0.7.0/ or this
Once you extracted your reads from BAM to FASTQ format you can actually try different aligners and any genome reference you want. :)