How and Where can I get the reference genome to run bowtie ?
1
0
Entering edit mode
10.0 years ago
catherine ▴ 250

I want to analyze a chip-seq data set in GEO (GSE11431), and I have two questions:

Its SRA file in each of GSM have multiple files/"runs", what does it mean? is it replicates or what? Do I just download all of them and run bowtie individually using same reference genome, and then combine them afterward?

Another question is where can I download mouse reference gene (mm9 version) for bowtie?

Thank you very much for any idea and help!!

ChIP-Seq • 3.2k views
ADD COMMENT
0
Entering edit mode
10.0 years ago

each time the sequencer is operated to sequence a set of samples is called a "run". So, some samples will be sequenced in multiple "runs", which can be treated as technical replicates. The samples that are sequenced in multiple "runs" will be merged at some point in the analysis pipeline.

There are pre built indexes for bowtie and bowtie2 on their website. http://bowtie-bio.sourceforge.net/index.shtml

If you are looking for annotations as well, tophat website provides a set of genomes and annotations, but the file size will be huge (14-20g).

ADD COMMENT
0
Entering edit mode

What do you mean by "multiple "runs" will be merged at some point in the analysis pipeline"? How can I merge them?

ADD REPLY
0
Entering edit mode

You can merge the raw fastq files before performing any analysis or you can merge the bam files after alignment. It depends on the type of analysis you are doing. Merging after alignment would be good as you can compare the differences between multiple runs after alignment and then merge.

ADD REPLY

Login before adding your answer.

Traffic: 2503 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6