how to download 3' UTR sequences of Anas platyrhynchos?
1
0
Entering edit mode
21 months ago

I would like to download 3'UTR sequences of Anas platyrhynchos for target prediction

Please help me by sharing terminal code or R scripts or browser-based download links

Thanks
KAMAL

miRNA gene target-prediction • 1.3k views
ADD COMMENT
4
Entering edit mode
21 months ago

You can obtain those from Ensembl, either via BioMart in the browser or its R-package:

if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")

BiocManager::install("biomaRt")


library("biomaRt")

ensembl <- useEnsembl(biomart = "genes", dataset = "aplatyrhynchos_gene_ensembl")

# use listAttributes(ensembl) to see all available annotation you could download

cdsAnnot <- getBM(attributes = c("ensembl_transcript_id","ensembl_gene_id","gene_biotype","chromosome_name","3_utr_start","3_utr_end","strand"),
                  mart = ensembl)

utrSeqs <- getSequence(id = cdsAnnot[,"ensembl_transcript_id"], 
                         type = "ensembl_transcript_id", 
                         seqType = "3utr", 
                         mart =ensembl)

downloadedUTRs <- merge(cdsAnnot,utrSeqs,by="ensembl_transcript_id")
ADD COMMENT
0
Entering edit mode

Thank you very much for sharing the code, sir.

I have used the given code but the output is totally messy sir

Is there any solution to make it proper

Thanks

KAMAL

ADD REPLY
2
Entering edit mode

In which regard is the output messy and what kind of error messages do you get? You should be able to paste this right into the R console and get the results you want.

ADD REPLY
0
Entering edit mode

I have used this commend to save the file

write.fasta(sequences = downloadedUTRs, names = names(downloadedUTRs), file.out = "downloadedUTRs.fasta")

Sir, I would like to get fasta formatted 3' UTR sequences

But what I got is

>ensembl_gene_id
ENSAPLG00000000001ENSAPLG00000000005ENSAPLG00000000006ENSAPLG00000000007ENSAPLG00000000010ENSAPLG00000000011ENSAPLG00000000015ENSAPLG00000000016ENSAPLG00000000023ENSAPLG00000000026ENSAPLG00000000029.......................etc

>gene_biotype
miRNAsnoRNAsnoRNAsnoRNAmiRNAscaRNAsnoRNAmiRNAmiRNAmiRNAsnRNAsnoRNAsnRNAscaRNAmisc_RNArRNAmiRNAsnoRNAsnoRNAmiRNAsnRNAsnRNAsnoRNAsnoRNAmiRNAsnRNAsnoRNAsnoRNAsnoRNAmiRNAsnoRNAmiRNAsnoRNAsnoRNArRNAmiRNAsnoRNAsnRNAmisc_RNAsnRNAsnoRNAmiRNAsnoRNAsnoRNAmiRNAsnoRNAsnRNAsnRNAmiRNAmiRNAsnoRNAsnoRNAmiRNAsnoRNAsnoRNAmiRNAmiRNAsnoRNAsnoRNAmiRNA

>chromosome_name
81910249311381224412210324Z11182132821891899126242214Z221161833182PEDO01014788.1PEDO01016475.1123245PEDO01017866.11118
7182275221241482117PEDO01009845.15182721178251317204820185PEDO01009845.122088524PEDO01014788.121259PEDO01017029.1PEDO01018351.124189112ZZZ3281PEDO01004542.1124

>strand
-1111111-111-11-1-1-1-111-11-111-11-11-11-11-111-1-1-1-1-1-1-1-11-11-1111111-1-1-1-1111-1

>3utr
Sequence unavailableSequence unavailableSequence unavailableSequence unavailableSequence unavailable sequence unavailableSequence unavailableSequence unavailableSequence unavailableSequence unavailableSequence 
AAGGGGCAGGGGAAGCCAAGGGAAGTGGCAGGGACTGAGATCTCCCCTTTCTAACCAGCAGCAGCTTCAGTGAAAAAGACTTGGTCTGGTCCTTAGCTGTTCATATAGCTCCCGGATATTTCGGCTTTCAGTAGTATCTGCTCAGAGCTCGGGCTCTGCTGCTTTACAGCGATCAGACGTGCGAGAAGCTCCTGCTCTTTCTTGGAGACCAGAACTTTTCAGACTAGCAAACAGCCTG

enter image description here

ADD REPLY
2
Entering edit mode

Well, I presumed some familiarity with R since you specifically asked for a solution in R. What my script gives you is a data.frame with some basic annotation like the Transcript ID, the Gene ID, the biotype and the chromosomal location:

  ensembl_transcript_id    ensembl_gene_id   gene_biotype chromosome_name 3_utr_start 3_utr_end strand
1    ENSAPLT00020000002 ENSAPLG00020000002         lncRNA               1          NA        NA     -1
2    ENSAPLT00020000003 ENSAPLG00020000003 protein_coding               1          NA        NA     -1
3    ENSAPLT00020000003 ENSAPLG00020000003 protein_coding               1    61111516  61111915     -1
4    ENSAPLT00020000004 ENSAPLG00020000004 protein_coding               1          NA        NA      1
5    ENSAPLT00020000004 ENSAPLG00020000004 protein_coding               1    61115679  61115914      1
6    ENSAPLT00020000005 ENSAPLG00020000005 protein_coding               1          NA        NA     -1

How and if you use this information, I left up to you. If you want to export it a .fasta, then you of course can:

downloadedUTRs <- na.omit(downloadedUTRs)

seqinr::write.fasta(
  as.list(downloadedUTRs[, "3utr"]),
  names = with(
    downloadedUTRs,
    paste(ensembl_gene_id, ensembl_transcript_id, gene_biotype, sep = "|")
  ),
  file.out = "./Duck3UTRs.fasta",
  open = "w",
  nbchar = 80
)
ADD REPLY
1
Entering edit mode

Sir,

Thank you so much for your help!

I was able to obtain the exact output that I was looking for, and really thank you for your assistance sir

ADD REPLY

Login before adding your answer.

Traffic: 1803 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6