I was given an CSV file by my PI(contain columns as listed below) and he asked me to help him find the promoter regions for a list of genes that transcribes the 2000 HGNC approved LncRNAs he is interested in and he specified that he wants the raw DNA sequences, not cDNA. My thought is to be able to collect the sequences and sequence coordinates as Granges objects, but I have no clue how to start and am currently figuring how to use the bioconductor. Can anyone gives me some ideas? or workflow? I have intermediate knowledge with R, and I assume this kind of job should be done with programming as the number of targets are so big.
[1] "hgnc_id" "symbol" "name" "locus_group"
[5] "locus_type" "status" "location" "location_sortable"
[9] "alias_symbol" "alias_name" "prev_symbol" "prev_name"
[13] "gene_family" "gene_family_id" "date_approved_reserved" "date_symbol_changed"
[17] "date_name_changed" "date_modified" "entrez_id" "ensembl_gene_id"
[21] "vega_id" "ucsc_id" "ena" "refseq_accession"
[25] "ccds_id" "uniprot_ids" "pubmed_id" "mgd_id"
[29] "rgd_id" "lsdb" "cosmic" "omim_id"
[33] "mirbase" "homeodb" "snornabase" "bioparadigms_slc"
[37] "orphanet" "pseudogene.org" "horde_id" "merops"
[41] "imgt" "iuphar" "kznf_gene_catalog" "mamit.trnadb"
[45] "cd" "lncrnadb" "enzyme_id" "intermediate_filament_db""