You can obtain those from Ensembl, either via BioMart in the browser or its R-package:
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("biomaRt")
library("biomaRt")
ensembl <- useEnsembl(biomart = "genes", dataset = "aplatyrhynchos_gene_ensembl")
# use listAttributes(ensembl) to see all available annotation you could download
cdsAnnot <- getBM(attributes = c("ensembl_transcript_id","ensembl_gene_id","gene_biotype","chromosome_name","3_utr_start","3_utr_end","strand"),
mart = ensembl)
utrSeqs <- getSequence(id = cdsAnnot[,"ensembl_transcript_id"],
type = "ensembl_transcript_id",
seqType = "3utr",
mart =ensembl)
downloadedUTRs <- merge(cdsAnnot,utrSeqs,by="ensembl_transcript_id")
In which regard is the output messy and what kind of error messages do you get? You should be able to paste this right into the R console and get the results you want.
Well, I presumed some familiarity with R since you specifically asked for a solution in R. What my script gives you is a data.frame with some basic annotation like the Transcript ID, the Gene ID, the biotype and the chromosomal location:
ensembl_transcript_id ensembl_gene_id gene_biotype chromosome_name 3_utr_start 3_utr_end strand
1 ENSAPLT00020000002 ENSAPLG00020000002 lncRNA 1 NA NA -1
2 ENSAPLT00020000003 ENSAPLG00020000003 protein_coding 1 NA NA -1
3 ENSAPLT00020000003 ENSAPLG00020000003 protein_coding 1 61111516 61111915 -1
4 ENSAPLT00020000004 ENSAPLG00020000004 protein_coding 1 NA NA 1
5 ENSAPLT00020000004 ENSAPLG00020000004 protein_coding 1 61115679 61115914 1
6 ENSAPLT00020000005 ENSAPLG00020000005 protein_coding 1 NA NA -1
How and if you use this information, I left up to you. If you want to export it a .fasta, then you of course can:
Thank you very much for sharing the code, sir.
I have used the given code but the output is totally messy sir
Is there any solution to make it proper
Thanks
KAMAL
In which regard is the output messy and what kind of error messages do you get? You should be able to paste this right into the R console and get the results you want.
I have used this commend to save the file
write.fasta(sequences = downloadedUTRs, names = names(downloadedUTRs), file.out = "downloadedUTRs.fasta")
Sir, I would like to get fasta formatted 3' UTR sequences
But what I got is
Well, I presumed some familiarity with R since you specifically asked for a solution in R. What my script gives you is a data.frame with some basic annotation like the Transcript ID, the Gene ID, the biotype and the chromosomal location:
How and if you use this information, I left up to you. If you want to export it a .fasta, then you of course can:
Sir,
Thank you so much for your help!
I was able to obtain the exact output that I was looking for, and really thank you for your assistance sir