Question

[Resolved] Local alternative to galaxy "Extract Genomic DNA using coordinates" tool

0

Entering edit mode

9.0 years ago

giroudpaul ▴ 70

Hello,

For a simple script I am writing, I need to extract the genomic data using coordinates, but I would need to do it locally on my computer.

Is the galaxy tool downloadable ? Is there an alternative ? It seems that bedtools can do something like this, but then I need the fasta for mm9 ? Where can I get this ?

Thanks

galaxy • 2.7k views

ADD COMMENT • link 9.0 years ago by giroudpaul ▴ 70

0

Entering edit mode

Yes, getfasta of BEDtools can do it. mm9 FASTA sequence can be downloaded from UCSC.

ADD REPLY • link updated 5.0 years ago by Ram 44k • written 9.0 years ago by Tej Sowpati ▴ 250

0

Entering edit mode

Is it in the mm9.2bit file ? How do I extract it ? It say to use their twoBitToFa tool, but I don't get how to install it

ADD REPLY • link 9.0 years ago by giroudpaul ▴ 70

0

Entering edit mode

No, you need the ChromFa.tar.gz file, which when uncompressed will give you one fasta file per chromosome. You can then create a master fasta file by concatenating all the files into one using 'cat' command.

ADD REPLY • link 9.0 years ago by Tej Sowpati ▴ 250

Ram · Answer 1 · 2015-11-24

To get mm9 FASTA files via the command-line:

$ wget http://hgdownload.cse.ucsc.edu/goldenPath/mm9/bigZips/mm9.2bit
$ wget http://hgdownload.cse.ucsc.edu/admin/exe/macOSX.x86_64/twoBitToFa
$ chmod +x ./twoBitToFa
$ for i in `seq 1 19` X Y M; do echo "converting chr$i"; ./twoBitToFa -seq=chr$i mm9.2bit chr$i.fa; done

If you are using Linux, get the twoBitToFa Kent tool with the following URL:

$ wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/twoBitToFa

Install samtools. On OS X, if you have Homebrew installed, you could use brew install samtools. On Ubuntu, you might run sudo apt-get install samtools. Or on a RedHat-like Linux, you might run sudo yum install samtools.

Index the FASTA files with samtools faidx:

$ for i in `seq 1 19` X Y M; do echo "indexing chr$i"; samtools faidx chr$i.fa; done

Then query coordinates with samtools faidx. Here is a convenience Perl script I wrote that wraps around samtools, which reads stranded or unstranded BED from standard input and writes FASTA to standard output:

To use this script, e.g.:

$ ./bed2faidxsta.pl < foo.bed > foo.fa

Ram · Answer 2 · 2015-11-24

0

Entering edit mode

9.0 years ago

Ian 6.1k

The following link should also be helpful: Perl To Retrieve Sequences From Ucsc Genome Browser

ADD COMMENT • link updated 5.0 years ago by Ram 44k • written 9.0 years ago by Ian 6.1k

Ram · Answer 3 · 2015-11-24

0

Entering edit mode

9.0 years ago

Matt Shirley 10k

pyfaidx has a script for this that is easy to install and works well: https://github.com/mdshw5/pyfaidx#cli-script-faidx

ADD COMMENT • link updated 5.0 years ago by Ram 44k • written 9.0 years ago by Matt Shirley 10k

score 0 · Answer 4 · 2015-11-24

0

Entering edit mode

9.0 years ago

swbarnes2 14k

samtools faidx can do it too.

ADD COMMENT • link 9.0 years ago by swbarnes2 14k