Hi,
Ive noticed UCSC downloads page has fasta files by chromosome for hg19 and mm9. But that isn't available for rn6. Is there some place from where i can download fasta files by chromosome for rn6 genome assembly ?
Apoorva
Hi,
Ive noticed UCSC downloads page has fasta files by chromosome for hg19 and mm9. But that isn't available for rn6. Is there some place from where i can download fasta files by chromosome for rn6 genome assembly ?
Apoorva
The UCSC utility twoBitToFa can get the sequence for an individual chromosome at a time via the -seq option, which you can run in a loop for each chromosome:
$ wget http://hgdownload.cse.ucsc.edu/goldenPath/rn6/bigZips/rn6.chrom.sizes
$ for chr in $(cut -f1 rn6.chrom.sizes); do twoBitToFa -seq="${chr}" http://hgdownload.soe.ucsc.edu/goldenPath/rn6/bigZips/rn6.2bit $chr.fa; done
$ # verify with the faSize util:
$ head -1 rn6.chrom.sizes
chr1 282763074
$ faSize chr1.fa
282763074 bases (14711797 N's 268051277 real 153502700 upper 114548577 lower) in 1 sequences in 1 files
%40.51 masked total, %42.73 masked real
You can download twoBitToFa and faSize from the following the directory appropriate for your operating system here: http://hgdownload.soe.ucsc.edu/admin/exe/
If you have further questions about UCSC data or tools feel free to send your question to one of the below mailing lists:
ChrisL from the UCSC Genome Browser
FYI, NCBI Rno6 chromosome wise sequences are available at: ftp://ftp.ncbi.nlm.nih.gov/genomes/R_norvegicus/
and ftp://ftp.ncbi.nlm.nih.gov/genomes/R_norvegicus/Assembled_chromosomes/seq/
download from http://hgdownload.cse.ucsc.edu/goldenPath/rn6/bigZips/rn6.fa.gz and split by file: e.g: How To Split One Big Sequence File Into Multiple Files With Less Than 1000 Sequences In A Single File
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
@ChrisL: Do you know why
*.chromFa.tar.gz
(assembly sequence in one file per chromosome.) are not made available for all genomes at UCSC? They are for human genome.We make these individual chromosome files when the chromosome count is under 100. Most assemblies these days have over 100 chromosomes and so we don't run them up anymore.