Dear all, I am going to get DNA sequence by its given chromosome position from the website of UCSC (link text), i.e, when I click the button 'get DNA', there would be a new html page produced, in which is my target DNA sequence; actually, for only one DNA sequence query, I can simply copy the result sequence, but I want to use perl to automate the procedure for many sequences (not more than 20). Firstly, I tried to mine the html codes and use perl HTML PARSER or LWP::useragent to accomplish the job, but I am not good at this, finally, I have to ask for your guys' help. Would you please show me how to do that? Thanks very much!
I finally get the solution for this problem with perl, and the following is my perl code to use DAS of UCSC to fetch DNA sequence by its given chromosome position:
#!/usr/bin/perl
use LWP::Simple;
#Use DAS of UCSC to fetch specific sequence by its given chromosome position
$URL_gene ="http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=chr2:100000,200000";
$genefile = get($URL_gene);
@DNA=grep {
/^[acgt]*$/i;
} split("\n",$genefile);
print @DNA,"\n";
Tip: indent your code with 4 spaces to format it properly.
yould should not (never) parse a XML document like this.
Exactly, you got the query right, now you have to apply the same care for the response handling. btw. check your code for a short sequence of length 10, or sequence starting with a whitespace or containing N it breaks. Instead of relying on random whitespace, use a perl XM parser module e.g. a subclass of XML::Parser.
Yeah, you are right in that case. I will revise it according to all your guys's suggestions. Thanks very much for all!
Can biomart do this job? I tried but found biomart only provides human gene sets. IF biomart has human whole genome, this work can be done using biomaRt package in R.