Question

How do I find promoter regions sequences and coordinates for a long list of LncRNAs with given the hgnc_id and symbols and geneID?

0

Entering edit mode

9.7 years ago

2015rpro • 0

I was given an CSV file by my PI(contain columns as listed below) and he asked me to help him find the promoter regions for a list of genes that transcribes the 2000 HGNC approved LncRNAs he is interested in and he specified that he wants the raw DNA sequences, not cDNA. My thought is to be able to collect the sequences and sequence coordinates as Granges objects, but I have no clue how to start and am currently figuring how to use the bioconductor. Can anyone gives me some ideas? or workflow? I have intermediate knowledge with R, and I assume this kind of job should be done with programming as the number of targets are so big.

[1] "hgnc_id"                  "symbol"                   "name"                     "locus_group"
[5] "locus_type"               "status"                   "location"                 "location_sortable"
[9] "alias_symbol"             "alias_name"               "prev_symbol"              "prev_name"
[13] "gene_family"              "gene_family_id"           "date_approved_reserved"   "date_symbol_changed"
[17] "date_name_changed"        "date_modified"            "entrez_id"                "ensembl_gene_id"
[21] "vega_id"                  "ucsc_id"                  "ena"                      "refseq_accession"
[25] "ccds_id"                  "uniprot_ids"              "pubmed_id"                "mgd_id"
[29] "rgd_id"                   "lsdb"                     "cosmic"                   "omim_id"
[33] "mirbase"                  "homeodb"                  "snornabase"               "bioparadigms_slc"
[37] "orphanet"                 "pseudogene.org"           "horde_id"                 "merops"
[41] "imgt"                     "iuphar"                   "kznf_gene_catalog"        "mamit.trnadb"
[45] "cd"                       "lncrnadb"                 "enzyme_id"                "intermediate_filament_db""

hgnc ucsc LncRNA ensembl promoter-regions • 3.4k views

ADD COMMENT • link updated 2.5 years ago by Ram 45k • written 9.7 years ago by 2015rpro • 0

Ram · Answer 1 · 2015-07-29

2

Entering edit mode

9.7 years ago

Maximilian Haeussler ★ 1.7k

The RSAT regulatory sequence analysis tools have an online tool "retrieve sequence" for exactly this. Upload the list of genes, define the length of the sequence, and it sends you back a fasta file with the putative promoters.

http://rsat.sb-roscoff.fr/

ADD COMMENT • link updated 2.5 years ago by Ram 45k • written 9.7 years ago by Maximilian Haeussler ★ 1.7k