Does anyone know where I can get the chrom sizes info for the ensembl reference genome. I'm attempting to do bedToBigBed but need the chrom sizes file and can't seem to find the ensembl version.
EDIT: sorry, should have been more clear on that. I used the igenomes package for Human (Grch37) - To run the Tuexedo pipeline. I got my GTF file of all the merged transcripts out, converted that to BED for use in an web based genome browser I'm working on. The required binary format is bigbed, so I'm using the bedToBigBed utility from UCSC which requires a chrom info file (start points of each chromosome + patches in the reference Fasta sequence) in tab delimited text format.
UPDATE: So, here's how I solved my problem in case anyone else comes across this.... I basically got rid of all the patches in the GTF file. That can be done with awk (I'm sure there are more efficient methods using regex ). The reference genome patches, as far as I can tell, are not utilised by any of the Tuexedo packages and just get in the way when doing file conversions and such in downstream. The solutions below using samtools and using looking in the SAM headers give the chromosonal lengths, but not the patch lengths. Thanks for all the solutions!
What file format do you need? What species are you using?
sorry, should have been more clear on that. I used the igenomes package https://support.illumina.com/sequencing/sequencing_software/igenome.ilmn for Human (Grch37) - To run the Tuexedo pipeline. I got my GTF file of all the merged transcripts out, converted that to BED for use in an web based genome browser I'm working on. The required binary format is bigbed, so I'm using the bedToBigBed utility from UCSC which requires a chrom info file (start points of each chromosome + patches in the reference Fasta sequence) in tab delimited text format.
If you aligned any reads using that, you can get the sizes from the SAM/BAM header.