How does GATK define their requirement of ‘karyotypic’ sorting? In the simple case of the UCSC human assembly, yes it is to place the chromosomes in numerical order 1-22, followed by X,Y and M, but where does one place the additional unlocalised contigs that are present in, for example, the Ensembl assembly? This was brought up as a comment in a related post.
Sorry, I missed that you were interested in the GRCh37/Ensembl naming. Broad also has a bundle for that; it's definitely worth digging on their FTP site before creating your own: ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/1.2/b37/human_g1k_v37.fasta.fai.gz
Thanks Brad for the quick response. I've done as you suggested, and mapped the ordering of the files from Broad to the Ensembl release. There is a greater number of contigs listed in the Ensembl file than the .fai file mentioned above, but it's a great place to start.
Thanks Brad, I'd not realised that Broad supplied sets for the Ensembl release too. Thanks for your advice!