If you can use pysam, you should be in business.
http://www.cgat.org/~andreas/documentation/pysam/api.html#pysam.Tabixfile
>>> import pysam
>>> tabixfile = pysam.Tabixfile( "/usr/local/share/gemini/data/hg19.CpG.bed.gz" )
>>> print tabixfile.contigs
['chr1', 'chr10', 'chr11', 'chr11_gl000202_random', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr17_ctg5_hap1', 'chr17_gl000204_random', 'chr17_gl000205_random', 'chr18', 'chr19', 'chr1_gl000191_random', 'chr1_gl000192_random', 'chr2', 'chr20', 'chr21', 'chr22', 'chr3', 'chr4', 'chr4_ctg9_hap1', 'chr4_gl000193_random', 'chr4_gl000194_random', 'chr5', 'chr6', 'chr6_apd_hap1', 'chr6_cox_hap2', 'chr6_dbb_hap3', 'chr6_mann_hap4', 'chr6_mcf_hap5', 'chr6_qbl_hap6', 'chr6_ssto_hap7', 'chr7', 'chr8', 'chr8_gl000197_random', 'chr9', 'chr9_gl000199_random', 'chr9_gl000200_random', 'chr9_gl000201_random', 'chrUn_gl000211', 'chrUn_gl000212', 'chrUn_gl000213', 'chrUn_gl000214', 'chrUn_gl000215', 'chrUn_gl000216', 'chrUn_gl000217', 'chrUn_gl000218', 'chrUn_gl000219', 'chrUn_gl000220', 'chrUn_gl000221', 'chrUn_gl000222', 'chrUn_gl000223', 'chrUn_gl000224', 'chrUn_gl000225', 'chrUn_gl000228', 'chrUn_gl000229', 'chrUn_gl000231', 'chrUn_gl000235', 'chrUn_gl000236', 'chrUn_gl000237', 'chrUn_gl000240', 'chrUn_gl000241', 'chrUn_gl000242', 'chrUn_gl000243', 'chrX', 'chrY']
Tabix keeps all the reference sequence names in the tabix index. In principle, you can get those by reading index alone.
Can you read the index (tbi) in the python API? I know in perl it is as simple as $tabix_obj->getnames(). We know you are not busy (jk) so maybe you could write that up real quick ;-)?
Theoretically speaking, possible, but I do not know Python well enough to do that... Just use Pysam.
doesn't
for line
require that the whole file is read?Yes, and I can see why you would rather read these values from a smaller index file. Just thought it might help!
It is helpful and it is one valid solution.