Question

Are The Pre-Built Reference Sequences From Cbcb Known Transcripts, Or The Entire Genome?

0

Entering edit mode

13.1 years ago

Dave Bridges ★ 1.4k

I am mapping reads to a pre-built ebwt file from the Bowtie site (specifically here ftp://ftp.cbcb.umd.edu/pub/data/bowtie_indexes/hg19.ebwt.zip). Is this reference the entire genome, or just known coding transcripts? Whichever one it is, what is the best way to obtain the other?

next-gen sequencing reference bowtie read short aligner • 2.3k views

ADD COMMENT • link updated 13.1 years ago by Istvan Albert 102k • written 13.1 years ago by Dave Bridges ★ 1.4k

score 1 · Answer 1 · 2011-11-13

1

Entering edit mode

13.1 years ago

Istvan Albert 102k

You can check what a bowtie index contains with the bowtie-inspect utility that comes with bowtie

bowtie-inspect -n hg19

I haven't checked your example since I build my own indices, but I am fairly sure that the file you link to contains the index of the whole genome! Coding transcripts are less well defined and would have been labeled/annotated in more detail rather than just a build id.

Answer to your second question is to build your own.

ADD COMMENT • link 13.1 years ago by Istvan Albert 102k

0

Entering edit mode

so to build my own i could download fasta formatted refseq human rna collection (ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/mRNA_Prot/human.rna.fna.gz) and bowtie-build that?

ADD REPLY • link 13.1 years ago by Dave Bridges ★ 1.4k

0

Entering edit mode

for what its worth the results of the bowtie-insect -n hg19 command for that index are a list of chr1-22/X/Y/M

ADD REPLY • link 13.1 years ago by Dave Bridges ★ 1.4k