Some questions about 3'UTR regions from rat 6.0 fasta and gtf files
1
0
Entering edit mode
2.6 years ago

I met a problem about miRNAseq and the miRNA target genes prediction.

I know the basic workflow and I tried the first method:

I downloaded the 3'UTR fasta files (version: rat 6.0) from ensembl biochart and UCSC respectively.

And I used each of these files and predicted different numbers of target genes using miranda in linux.

However I were not satisfied with all the results on the numbers of target genes(with setting parameters below:).

miranda rno_DEGs.fasta  GCF_000001895.5_Rnor_6.0_3'UTR.fa -sc 150 -en -30 -strict | grep ">>" > 
rno_VS_NCBI_END.txt

Can I adjust the two parametesr:-sc 150 -en -30 to low standard ??? the default parameter is: -sc 140 -en 1

Here is the fasta and gtf files links: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/895/GCF_000001895.5_Rnor_6.0/GCF_000001895.5_Rnor_6.0_genomic.fna.gz

annotation files:

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/895/GCF_000001895.5_Rnor_6.0/GCF_000001895.5_Rnor_6.0_genomic.gff.gz

So I tried the second method:

I used the rat 7.0 3'UTR fasta file from ensembl biochart and run the same flowchart and what suprised me was I got the most numbers of target genes and I thought it was a good result.

My question is my mRNA counts was done by rat 6.0 fasta I described above . So I don't know if it is suitable for me to use different versions of fasta files to analysis target genes ?

If not, I also have the other one method:

I think I can extract 3'UTR sequences from rat 6.0 fasta and gtf files. But I didn't find the coordinations of 3'UTR and any its information from the 6.0 gtf file. I don't know it's why. And because of this reason, I have no idea how to extract 3'UTR information from fasta and gtf files then.

And I got the final method:

I communicate with the sequencing company. They told me they use the whole genomic fasta file as the 3'UTR sequence and to get the target genes prediciton. I still don't know why they do this step in this way ??? Can I do this?

I looked up many methods including using R biomart or other methods but most of them were not suitable for me.

So I really hope somebody could give me some advice or method. Vary thankful.

UTR miRNAseq • 1.0k views
ADD COMMENT
0
Entering edit mode

Who could help me

ADD REPLY
0
Entering edit mode
2.6 years ago
Shred ★ 1.6k

A quick script to get just 3' UTR in BED format

import sys

with open(sys.argv[1],'r') as gtf_file:
    for line in gtf_file:
        if line.startswith('#'):
            continue
        else:
            fields = line.rstrip().split('\t')
            if fields[2] == "three_prime_utr":
                print(f"{fields[0]}\t{fields[3]}\t{fields[4]}\t{fields[6]}")

Launch this with

python3 script.py your_annotation.gtf > three_prime_utr.BED

Then use BEDtools getfasta to extract fasta of those regions

bedtools getfasta [OPTIONS] -fi <input FASTA> -bed three_prime_utr.BED
ADD COMMENT
0
Entering edit mode

Thanks, sir. I am not familiar with the python. But I saw the words "three_prime_utr" . I don't know if it is that I should "three_prime_utr" in the GTF ?

ADD REPLY
0
Entering edit mode

That's how the 3'UTR regions are encoded inside the GTF file. https://www.ensembl.org/info/website/upload/gff.html

ADD REPLY

Login before adding your answer.

Traffic: 1696 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6