Hi! I am trying to extract start and stop positions for a gene list. Can someone suggests and files for homo sapiens that can be downloaded from which I can parse the info using perl? I know UCSC has files like knowngene and refgene but I am not sure about the formatting in those txt files. Any suggestions?
May it would be easier for you to use the "Table Browser" and download the annotations as e.g. BED file. You can find an explanation of this format here:
You might also try ensembl biomart. Given that you already have a gene list, ensembl has a web front-end you could use to extract the information you need, and you likely wouldn't have to use perl to parse out the results. You can select the fields you would like returned in a text file (i.e. start, stop).
May it would be easier for you to use the "Table Browser" and download the annotations as e.g. BED file. You can find an explanation of this format here:
http://genome.ucsc.edu/FAQ/FAQformat.html#format1