I have a spreadsheet with 500 different positions on different chromosomes, and I'd like to pull out the DNA sequences between those positions. The spreadsheet is already set up in a way that could easily be related to the UCSC Genome Browser database if only I had a way to either upload my spreadsheet to the database or download the necessary tables. It seems like there must be a table that relates the position on the chromosome to a specific nucleotide, so I feel like if I found that table I could do this. So my question is, does anyone know of a way to do this? Is there an easier way to do this?
I tried connecting remotely to UCSC's MySQL server so that I could access the tables through MS Access, but I couldn't connect to it. I'm also somewhat familiar with Biopython if there's an easier way to do this using another database like NCBI's Nucleotide database.
Thanks
If the sequences are all from the same genome I would recommend downloading the 2bit file for the genome and using a command line package like
twoBitToFa
.For hg19, download this file (778 MB) and access it with this linux software.
If you'd prefer to do it in R, check out the BSgenome and DNAstrings packages from Bioconductor.
-Micah