Question

how to download 3' UTR sequences of Anas platyrhynchos?

0

Entering edit mode

21 months ago

kuttibiotech2009 ▴ 30

I would like to download 3'UTR sequences of Anas platyrhynchos for target prediction

Please help me by sharing terminal code or R scripts or browser-based download links

Thanks
KAMAL

miRNA gene target-prediction • 1.3k views

ADD COMMENT • link updated 21 months ago by GenoMax 147k • written 21 months ago by kuttibiotech2009 ▴ 30

GenoMax · Accepted Answer · 2023-03-01

4

Entering edit mode

21 months ago

Matthias Zepper 5.0k

You can obtain those from Ensembl, either via BioMart in the browser or its R-package:

if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")

BiocManager::install("biomaRt")


library("biomaRt")

ensembl <- useEnsembl(biomart = "genes", dataset = "aplatyrhynchos_gene_ensembl")

# use listAttributes(ensembl) to see all available annotation you could download

cdsAnnot <- getBM(attributes = c("ensembl_transcript_id","ensembl_gene_id","gene_biotype","chromosome_name","3_utr_start","3_utr_end","strand"),
                  mart = ensembl)

utrSeqs <- getSequence(id = cdsAnnot[,"ensembl_transcript_id"], 
                         type = "ensembl_transcript_id", 
                         seqType = "3utr", 
                         mart =ensembl)

downloadedUTRs <- merge(cdsAnnot,utrSeqs,by="ensembl_transcript_id")

ADD COMMENT • link 21 months ago by Matthias Zepper 5.0k

0

Entering edit mode

Thank you very much for sharing the code, sir.

I have used the given code but the output is totally messy sir

Is there any solution to make it proper

Thanks

KAMAL

ADD REPLY • link 21 months ago by kuttibiotech2009 ▴ 30

2

Entering edit mode

In which regard is the output messy and what kind of error messages do you get? You should be able to paste this right into the R console and get the results you want.

ADD REPLY • link 21 months ago by Matthias Zepper 5.0k

0

Entering edit mode

I have used this commend to save the file

write.fasta(sequences = downloadedUTRs, names = names(downloadedUTRs), file.out = "downloadedUTRs.fasta")

Sir, I would like to get fasta formatted 3' UTR sequences

But what I got is

>ensembl_gene_id
ENSAPLG00000000001ENSAPLG00000000005ENSAPLG00000000006ENSAPLG00000000007ENSAPLG00000000010ENSAPLG00000000011ENSAPLG00000000015ENSAPLG00000000016ENSAPLG00000000023ENSAPLG00000000026ENSAPLG00000000029.......................etc

>gene_biotype
miRNAsnoRNAsnoRNAsnoRNAmiRNAscaRNAsnoRNAmiRNAmiRNAmiRNAsnRNAsnoRNAsnRNAscaRNAmisc_RNArRNAmiRNAsnoRNAsnoRNAmiRNAsnRNAsnRNAsnoRNAsnoRNAmiRNAsnRNAsnoRNAsnoRNAsnoRNAmiRNAsnoRNAmiRNAsnoRNAsnoRNArRNAmiRNAsnoRNAsnRNAmisc_RNAsnRNAsnoRNAmiRNAsnoRNAsnoRNAmiRNAsnoRNAsnRNAsnRNAmiRNAmiRNAsnoRNAsnoRNAmiRNAsnoRNAsnoRNAmiRNAmiRNAsnoRNAsnoRNAmiRNA

>chromosome_name
81910249311381224412210324Z11182132821891899126242214Z221161833182PEDO01014788.1PEDO01016475.1123245PEDO01017866.11118
7182275221241482117PEDO01009845.15182721178251317204820185PEDO01009845.122088524PEDO01014788.121259PEDO01017029.1PEDO01018351.124189112ZZZ3281PEDO01004542.1124

>strand
-1111111-111-11-1-1-1-111-11-111-11-11-11-11-111-1-1-1-1-1-1-1-11-11-1111111-1-1-1-1111-1

>3utr
Sequence unavailableSequence unavailableSequence unavailableSequence unavailableSequence unavailable sequence unavailableSequence unavailableSequence unavailableSequence unavailableSequence unavailableSequence 
AAGGGGCAGGGGAAGCCAAGGGAAGTGGCAGGGACTGAGATCTCCCCTTTCTAACCAGCAGCAGCTTCAGTGAAAAAGACTTGGTCTGGTCCTTAGCTGTTCATATAGCTCCCGGATATTTCGGCTTTCAGTAGTATCTGCTCAGAGCTCGGGCTCTGCTGCTTTACAGCGATCAGACGTGCGAGAAGCTCCTGCTCTTTCTTGGAGACCAGAACTTTTCAGACTAGCAAACAGCCTG

enter image description here

ADD REPLY • link updated 21 months ago by GenoMax 147k • written 21 months ago by kuttibiotech2009 ▴ 30

2

Entering edit mode

Well, I presumed some familiarity with R since you specifically asked for a solution in R. What my script gives you is a data.frame with some basic annotation like the Transcript ID, the Gene ID, the biotype and the chromosomal location:

  ensembl_transcript_id    ensembl_gene_id   gene_biotype chromosome_name 3_utr_start 3_utr_end strand
1    ENSAPLT00020000002 ENSAPLG00020000002         lncRNA               1          NA        NA     -1
2    ENSAPLT00020000003 ENSAPLG00020000003 protein_coding               1          NA        NA     -1
3    ENSAPLT00020000003 ENSAPLG00020000003 protein_coding               1    61111516  61111915     -1
4    ENSAPLT00020000004 ENSAPLG00020000004 protein_coding               1          NA        NA      1
5    ENSAPLT00020000004 ENSAPLG00020000004 protein_coding               1    61115679  61115914      1
6    ENSAPLT00020000005 ENSAPLG00020000005 protein_coding               1          NA        NA     -1

How and if you use this information, I left up to you. If you want to export it a .fasta, then you of course can:

downloadedUTRs <- na.omit(downloadedUTRs)

seqinr::write.fasta(
  as.list(downloadedUTRs[, "3utr"]),
  names = with(
    downloadedUTRs,
    paste(ensembl_gene_id, ensembl_transcript_id, gene_biotype, sep = "|")
  ),
  file.out = "./Duck3UTRs.fasta",
  open = "w",
  nbchar = 80
)

ADD REPLY • link 21 months ago by Matthias Zepper 5.0k

1

Entering edit mode

Sir,

Thank you so much for your help!

I was able to obtain the exact output that I was looking for, and really thank you for your assistance sir

ADD REPLY • link 21 months ago by kuttibiotech2009 ▴ 30