Download Genomic Ranges From A List Of Coordinates Via Web
4
2
Entering edit mode
12.9 years ago
Anima Mundi ★ 2.9k

Hello,

given a set of genomic positions (see the example below) in a text.txt, how could I download all the their FASTAs (i. e. 1 Kb upstream and 1Kb downstream in respect to the given position) via web?

Example:

#Name    Chromosome    Position    Strand
Name_1    chr7    103482772    +
Name_2    chr7    103488456    +
fasta coordinates • 3.7k views
ADD COMMENT
1
Entering edit mode

To reiterate my point, context is still important. The solutions provided thus far are useful, but only for the UCSC genome browser. If you wanted data from, for instance, Carica papaya, then none of the useful answers thus far would address your question, since the UCSC browser does not host data for that organism. So defining the scope of your question would be immensely helpful to people that find this question in the future: do you mean the mouse genome, or the human genome, or common mammal model species, or all species found in the UCSC genome browser?

ADD REPLY
0
Entering edit mode

The process will be very different depending on the organism the data describe. I assume this is for the human genome. If you're looking for an answer that is specific to the human genome, making that explicit would be helpful.

ADD REPLY
0
Entering edit mode

This is for the mouse genome, but since I work also with other organisms I would prefer a more "general" solution. Thanks.

ADD REPLY
0
Entering edit mode

For instance, you have already gotten answers for specific to human and mouse. While you may know what "hg19" means and how to change that to get the data you want in the future, someone else looking for answers to this same question in the future may not.

ADD REPLY
0
Entering edit mode

For instance, you have already gotten answers specific to human and mouse. While you may know what "hg19" means and how to change that to get the data you want, someone else looking for answers to this same question in the future may not.

ADD REPLY
0
Entering edit mode

Just to be clear, I don't think this is a bad question. In fact, I think it could be useful to many people in the future. My comments are intended to make this question a more useful resource to them when they are brought here by a Google search.

ADD REPLY
0
Entering edit mode

I understand your point, and I agree I should have been more specific. While for the contingent issue I needed to solve a problem for the mouse, I wanted a "general" solution, but I agree that the concept of general itself, in this case, is fuzzy. So the question could regard solutions "as broad as possible". Thanks to all of you, you were precious.

ADD REPLY
5
Entering edit mode
12.9 years ago
  1. Convert this to a BED file.
  2. Upload the resulting bed file to the UCSC browser as a custom track.
  3. Use the UCSC genome browser track browser to get the DNA sequence.

An alternative, after step 1, is to use Galaxy to do steps 2 and 3.

ADD COMMENT
0
Entering edit mode

Unfortunately I had a problem while trying to upload the BED file to the UCSC. Galaxy instead worked, thanks.

ADD REPLY
5
Entering edit mode
12.9 years ago

this is not via a web-service, but the R code is so portable that shouldn't really matter

library(BSgenome.Mmusculus.UCSC.mm9)
library(ShortRead)

gr<- GRanges("chr7", IRanges(103482772, 103482772),strand='+')
grwider<-flank(gr,1000,both=TRUE)
seqs<-getSeq(Mmusculus,grwider,as.character=FALSE)
names(seqs)<-"Name_1"
writeFasta(seqs,file="myseqs.fa")
ADD COMMENT
4
Entering edit mode
12.9 years ago
Ian 6.1k

If you prefer a web-based solution try GALAXY. You can upload your coordinates as:

chr start end

and specify the correct genome version, e.g mm9 (mouse).

You can then retrieve the sequence. It is also possible to add/subtract 1000bp to a set of coordinates within GALAXY.

GALAXY is a great tool for "doing things" with genome coordinates.

ADD COMMENT
0
Entering edit mode

This solution is exactly what I was searching for (even if I appreciated also the non-web solutions proposed). I select the Sean's equivalent solution as the chosen answer as it came first. Thanks anyway.

ADD REPLY
3
Entering edit mode
12.9 years ago

using bash and the UCSC DAS server:

create the following XSLT stylesheet:


<xsl:stylesheet xmlns:xsl="&lt;a href=" <a="" href="http://www.w3.org/1999/XSL/Transform" rel="nofollow">http://www.w3.org/1999/XSL/Transform" "="" rel="nofollow">http://www.w3.org/1999/XSL/Transform'
    version='1.0'
    >

<xsl:output method="text" encoding="UTF-8"/>

<xsl:template match="/">
  <xsl:value-of select="DASDNA/SEQUENCE/DNA"/>
</xsl:template>

</xsl:stylesheet>

and run the following shell:

IFS="\t"
grep -v "#" input.txt | while read LINE
do
    CHROM=`echo $LINE| cut -d ' ' -f2`
    POS=`echo $LINE| cut -d '   ' -f3`
    echo $LINE | tr "\t" "_" | sed 's/^/>/'
    curl -s  "http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=${CHROM}:$((POS-1000)),$((POS+1000))" |\
    xsltproc --novalid stylesheet.xsl - | tr -d " \n" | fold -w 50
    echo
done

result:

$sh download.sh

>Name_1_chr7_103482772_+
atggatgttttaaaatcattgtcatacaaaccttgggtatctagacaaca
aatatcaaatatttgtcttctttgtcaaagttgtggttgaaggataggaa
tcaatagcagagtttcctttatccacattatacttcagcaagatttgact
cacaatgtctttaattagtttaaagttgcccccacactctcttttaagac
agtgacatacatttctgttccttccaggatgtcattttccttgacaagtc
ctcagttattttaagtttgtgactaaaccttgtgtgaaccccgttttccc
ccaggaatacttttctgtgcttttaatgtgcactctttgagtcttcaaaa
cggtaactagaagttctatgatcccccatctctacaagaaaatgtacata
tgttcataaaaatgtaggctactcgcttccaagaaacacaatgaatattt
tattaccaaaaataacccacctattgataacattacacattcatgttggg
tcaattctatatttcatagatgaaagatgtggattactcaaatctcttta
gtttataatttgccatggttagtgttaaagtgggttcaacaagccttggt
ctatttttcatgggtttcaagaactaagacatctgtggatagggtattac
caaacaaagggcaagtacattaaaattatgatttttttatgtgaaaaata
taaccccatatataaaaatgatacaattgtaaaagaaatattttattatt
tcaaacactttcacaaagcttggtacgatattttttcaggaagtttagca
aagttatcaccttatgtctacataagaagtgtaagctaagaatggcacaa
atatctaagatgatttccatttcctttccttttcatcatttgctctttct
ttaaaagggatatctaaaggcttccatcagttaataaaaaaaaaaaacac
agactgttctgaaaatgtagtttggaaagttagtgttatattgtaaatga
aaaagaaaaaatgaattatagaattcctttttatccctctttaatctgtt
aattcaaatatgaataactgtctacttacagaatttggcttgtctattat
tttctttctttcctaggtcaatgtactcagacatttcaacaaagccaaac
atgatttcatatgctgaaaagtaatcatagaatttcctaaaaacaccctt
attgcagcttatctgtgaagtaccagcctgatgaaaataggtatgaaaac
aatagctcttaagtagagtaatgctacaagatattgaatagtacgtgcac
acacactagcacatatacactgtgtatatttacttttcaaagcactgatt
tgattatttggtttcagatttaagtttaagaagccaaaaagcactaaaac
cttttaaaagtcattctggaattgtgtatctatggacttaagttagaaat
ggaagagaaactacctatttccacacctctagttagttctataaatagag
ccaattctaagtcaacttgattctttccttactcagtgcacttaaaagat
gagatgtcttgatgctgcctccccattcctctcccagaactaccatttac
tgaatgcctctctgtgccacgtttagagaggcaggagaggggaaaagttg
acagcatagaaaccctgtctgcctatgtttagaaccttgctcactgccaa
ggagttgtggaatcttgggcaggttactctatcattctattcctcagttt
cctttccaggaaaatgaggatgataataatagggtagctgtgaagagtaa
gtgagtgtacggcacacagtgttgtacatgttggctattattatcattcc
cattttaaagataaaggaaccaagactcaggaaattttttttttttagga
gacagggtcttgctctgtcacctaggctcaggtgcactggtatgatcaca
gctcactgcagcctcaactttccaggctcaagcaatcctcccacctcagc
c
>Name_2_chr7_103488456_+
ggcgactcctcaaggatctagaaacagaaataccatttgactcagcaatc
ccatcactgggtatatacccaaagggttataaatcattccactataaaga
cacatgcgcatgtatgtttattgcggcactgttcacaatagcaaagcctc
ggaaccaacccaaatgctcatcaatgatagactggataaagaaaatgtgg
cacatatacacctggaatactatgcagccataaaaggatgagttcatgtc
ctttgcagggacatggatgaagctggaaaccatcattctcagcaaactaa
cacaagaacaggaaaccaagcaccgcatgttctcactcataagtgggagt
tgaacaatgagaacacacggacacagggagggaaacatcacacaccaggg
cctgccagagggtggggggctagaggagggatagcactaggagaaatacc
taatgtagatgatgggttgatgggtgcagcaaaccaccaatgcatgtgta
cacctatgtaacaaacctgcatgtcctgcacatgtaccccataacttaaa
gtataataacaaaaaatgatataatatcatgtaactgacagataggtcaa
acttggcatttctttggcagagagaagagaggagaagagacaggcttgag
aatattggaggtagcttttaggacctggtgatagtttattttggttttgt
tttaaagaaagccttgttgaggctctttttactctcctacatggcttgta
tttatatctctaaagcctcctcttctttgagtttctgtggcctgcatatg
tgtaaaatctgactcctaaactggaagttgggttgtgaattaaacacaca
agagagcaggtcacaggagcaggaaccaggtgttaaggcagaatatagtt
ctggacaggtagccagcatgttgccgttagtttcatttagcaaagaaaag
aaaagacaaagaataaaatatttgataaaacatttctacctggagctgtt
tgtatatggcagcagtcacaggttatttagatataacattgctagggata
attcttttaggtcagccaatctggaggtacagttaaaataatcagatcct
gtttctacatacagaggctttctggagttagacaagggttaccacgcctc
ttgttcatcacttctactagggtttcatgctcagcgtgggtgatctccag
atcatttacctgtttaaatggaaattttgtttgagagcaggagaaacaca
gcactgagatatattctttaattctgcaaataaaatttcaacatttaatg
aattgaagccctgggtacagctattgacattttcagttggaaagcacaga
atataacctaattgaggggattttaagacatatagcttttctgggaggcc
tggagaagtaaacttgggtgttgcccagcaaagagtcatgtccaccccac
catgggacaggtccaacataaaaacaacagctacttccccccgaatcaac
agaactttcaccgcagttattcctccaaaataagaatcttcacttgatag
aatactgatctcgccatactgggtccccaggtcacattgcactcttcaat
atccagtctcatgaggctgagtctgataggtgtaaactaggtgacacgcc
ttcaccccagaaaggaaggaactatgggaattgtattttgggaagactat
actcacaatgtgggaaactatataaaaatgttgggcaatcatggatggta
catgcccattacagagggcaagttcaaaatctacaaaattttggacttct
tgatccaaaagtagctaaaataactgatagtttttaaaaattatggcctt
taggaccatttccaagaacctacactttgtgtacctcaccccttaccttg
aatgagcattctgcaatgggaagatattgtttatcacagtcaatctactt
gatgaacagcagcaaaagttccccataccctacctgggacagcctaacag
a
ADD COMMENT

Login before adding your answer.

Traffic: 1908 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6