Hi,
I need to obtain the dna sequence for set of exon coordinates I obtained from my DEXSeq analysis so that I can perform a motif analysis using e.g . (X)Streme,MeMe suit.
I know that biomart gives us the option to retrieve cdna from ensemble using chromosomal coordinates combined with getSequence()
. Unfortunately I have not been able to do this. Whenever I run getSequence() , I receive the following 443 error:
Error in curl::curl_fetch_memory(url, handle = handle) :
Timeout was reached: [dec2021.archive.ensembl.org:443] Operation timed out after 300000 milliseconds with 0 bytes received
I use the following code for this:
human.mart <- useMart(host="https://dec2021.archive.ensembl.org", "ENSEMBL_MART_ENSEMBL", dataset="hsapiens_gene_ensembl")
Example coordinates:
getSequence(chromosome = 12, start = 54369133, end = 54391298, mart=human.mart,seqType = "cdna",type = "ensembl_gene_id")
head(human_df,2)
groupID_strip groupID featureID genomicData.seqnames genomicData.start genomicData.end genomicData.width genomicData.strand HGNC.symbol
1 ENSG00000001497 ENSG00000001497.16 E002 chrX 65512583 65512901 319 - LAS1L
2 ENSG00000004975 ENSG00000004975.11 E001 chr17 7225341 7225454 114 - DVL2
Gene.stable.ID.1
1 ENSMUSG00000057421
2 ENSMUSG00000020888
getSequence(chromosome = human_df$genomicData.seqnames[1],
start = human_df$genomicData.start[1],
end = human_df$genomicData.end[1],
type="ensembl_gene_id",
seqType="cdna",
upstream=20,
mart=human.mart)
For some reason I cannot receive the cdna using the code abov, however if use the ensemble ID, I can obtain cDNA, but thats not what i want:
#Works
getSequence(id = human_df$groupID_strip[1],
type="ensembl_gene_id",
seqType="cdna",
upstream=20,
mart=human.mart)
Does anybody know how solve this issue or are there maybe other ways to retrieve the cDNA using exon coordinates obtained from dexseq?
Thanks in advance!
Unfortunately other ensembl serves don't work..
This is confusing. Is your post about the
curl
time out error or is it about getting the sequences that you want?If your
#Works
code block works in that it retrieve sequences then the real issue isn't the time out error. If so, what exactly is the issue with what is returned from your workinggetSequence
example when using the ensembl ID?