Hi,
Some TAIR users have asked us the same question before. I'm sending you a copy of the reply from my colleague David. Hope this helps!
Best
Tanya Berardini
TAIR Curator
Dear Torsten,
You can find additional information regarding the Centromeres in 'Analysis of the genome sequence of the flowering plant Arabidopsis thaliana', Nature 408, 796-815. You can access a pdf file of this paper here:
http://www.nature.com/nature/journal/v408/n6814/pdf/408796a0.pdf
see figure 6 for a depiction of the centromeric regions
Determining where these regions begin and end depends somewhat on how they are defined in addition many of the regions are not fully sequenced due to the presence of a lot of repetitive DNA that was difficult or impossible to sequence using the technology available at the time.
For the location of the unsequenced centromeres and NORs you can check out the Assembly_GFF directory
ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR9_genome_release/TAIR9_gff3/Assembly_GFF
(the assembly has not changed in TAIR10, hence the coordinates of TAIR9 are still valid for the current annotation)
You can also see these regions in our GBrowse viewer by searching for the assembly unit name e.g. CEN1, in addition by checking the GAPs track option this will highlight all regions where the sequence is unknown (marked with Ns). The centromere and NOR gaps have been arbitrarily set to 1000 nt (there are 1000 N's in the chromosome sequence). However this is NOT the estimated size of the gaps.
All the best
david
Thank you very much, I will take a look on that.