If you want exactly the MPI-EVA version (which includes a circularised phiX and extended reference mtDNA) the construction is similar to the reference used by the 1000 Genomes Project (compare ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/),
with minor changes. It's made as follows :
1 Download individual chrs from ensembl ftp (just like 1000g)
ftp://ftp.ensembl.org/pub/current_fasta/homo_sapiens/dna/
2a Download the newer version of the mitochondrion (NC_012920, just like 1000g)
http://www.ncbi.nlm.nih.gov/nuccore/251831106
2b Copy the first 1000bp of the mitochondrion onto its end. The resulting sequence is named "MT".
3 Download the concatenated decoy sequences from 1000 Genomes:
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5cs.fa.gz
Also compare their READMEs:
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/README_human_reference_20110707
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.slides.pdf
4 Download the Human herpes virus (NC_007605, aka EBV) from NCBI, just like 1000g. The sequence is then named "NC_007605".
http://www.ncbi.nlm.nih.gov/nuccore/NC_007605
5a Download phiX-174 reference (NC_001422).
http://www.ncbi.nlm.nih.gov/nuccore/NC_001422
5b Copy the first 1000bp of phiX onto its end, name the result "phiX".
6 Create a reference (whole_genome.fa) with chrs 1-22, X, Y, extended
NC_012920 MT, the non-chromosomal supercontigs, the
NC_007605 EBV, the decoy sequences (hs37d5), extended phiX.
The order is chosen to match 1000 Genomes (plus phiX), see their fai file:
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz.fai
Note that two sequences (MT, PhiX) are circular and have been extended to
facilate alignment. The correct incantation to wrap these alignments to their
correct length is
bam-rewrap MT:16569 phiX:5386
or
bam-rmdup -z MT:16569 -z phiX:5386
perhaps I am wrong but there is no "the reference with PhiX". When I was at the Max Planck Inst. in Leipzig, we made our own by concatenating the human reference and some decoy sequences. I suggest you contact the center in question that produced the BAM file.
@ Gabriel R,
yes! You are right. The BAM files are from Max Planck Inst. in Leipzig. can you share your ref, please...?
I am no longer there, I will ask my former supervisor, let's see if they can stick it somewhere. in the meantime, here are the accessions for the reference: http://cdna.eva.mpg.de/neandertal/Chagyrskaya/bam/README
@ Gabriel R, Thank you for the special favor... and yes, of course, I have gone through this URL before that you gave: http://cdna.eva.mpg.de/neandertal/Chagyrskaya/bam/README
All the contig names and length information is in accordance with the BAM file that's given in the README except phiX length. For example: In the readme, the accession code for phiX is given: NC_001422.1 length: 5386
but in the downloaded BAM files in the header section:
@SQ SN:phiX LN:6386
with different length. that shows might be different phiX genome is used perhaps. Do you have any idea about that?
You can find the sequence of the phiX genome that Illumina uses at this link.
@genomax
Hi, The illumina genome that you shared has the following info:
From the "genome.dict" file of the illumina phiX genome:
Different than what I mentioned earlier...
edit: if we believe the figure are correct, that's 1000 nt difference (6386-5386)
That is the official Illumina phiX sequence. NCBI's version is also the same length (Illumina's has a few SNP's compared to the NCBI reference). Unless you get clarification from the source of your BAM files it would be difficult to explain the difference you see.