I am trying to extract the DNA sequence 1000 bp around the Transcription start site (TSS) of a gene, and I get a code as follows:
# Load the biomaRt package
library(biomaRt)
# Connect to the Ensembl database through the biomaRt interface
ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
# Enter the gene symbol of interest
gene_symbol <- "ITGAE"
# Get the Ensembl gene ID for the gene symbol of interest
gene_id <- getBM(filters = "external_gene_name",
values = gene_symbol,
attributes = "ensembl_gene_id",
mart = ensembl)
# Get the TSS position for the gene of interest
tss <- getBM(filters = "ensembl_gene_id",
values = gene_id,
attributes = c("chromosome_name", "start_position", "strand"),
mart = ensembl)
# Extract the DNA sequence 1000 bp around the TSS
seq <- getSequence(id = tss$chromosome_name,
start = tss$start_position - 1000,
end = tss$start_position + 1000,
type = 'ensembl_transcript_id',
seqType = 'transcript_flank',
mart = ensembl)
# Print the DNA sequence
cat(seq[[1]])
This can get the correct gene id, but cannot get the correct sequence. Do you have any idea how to correct this code? Or is there a better way to achieve the goal?
Any help will be deeply appreciated.
It looks like the strand is not getting specified when running
getSequence
(and that there isn't an option for specifying strand) so are you getting the complimentary sequence?Additional info from
getSequence