How can I extract 3'UTR of bacteria (Pseudomonas aeruginosa) without using R?
2
0
Entering edit mode
6 weeks ago
Anshika • 0

How can I extract 3'UTR of bacteria (Pseudomonas aeruginosa) without using R? I have Gene_Ids extracted from NCBI.

prokaryotes UTR • 722 views
ADD COMMENT
3
Entering edit mode
6 weeks ago
Michael 55k

Your best bet is to use experimental data. This cannot be done from automatic genome annotations in databases, mostly due to historical reasons and because bacterial transcription machinery works differently from the eukaryotic one. Unlike in eukaryotes, bacterial 3'-UTR's as a regulatory entity have only recently moved into focus and little is known about their regulatory role (reviewed by Menendez-Gil & Toledo-Arana).

Prokaryotic genome annotation tools mostly feature a simplistic gene model of "1 gene == 1 CDS," often without annotating transcripts and other features. Therefore, extracting UTR candidates from public annotation will not work out of the box. The review paper lists some methods and studies concerned with detecting bacterial UTRs.

If you only have automatic annotation data, your best bet may be to detect rho-dependent termination signals and investigate the region between CDS end the signal. The result may be noisy.

ADD COMMENT
0
Entering edit mode
6 weeks ago
Juke34 8.9k

With AGAT.
agat_sp_extract_sequences.pl --gff infile.gff --fasta infile.fasta -t 3'-utr

or three_prime_utr it depends how the feature is defined in your file.

To focus only on the chosen gene ids, prior, you should filter your infile.gff with agat_sp_filter_feature_from_keep_list.pl

ADD COMMENT
1
Entering edit mode

This won't work. UTR isn't a concept in bacterial genome annotation (which doesn't mean they do not exist, they are just not regularly annotated by annotation pipelines):

grep -ve "^#" ncbi_dataset/data/GCF_000006765.1/genomic.gff | cut -f3 | sort -u
CDS
exon
gene
ncRNA
protein_binding_site
pseudogene
region
RNase_P_RNA
rRNA
tmRNA
tRNA

Also, there are no "implicit UTRs" where CDS start end deviates from gene start/end:

From Agat output:

    ----------------------------- Check10: check utrs ------------------------------
No UTRs created
No UTRs locations modified
No supernumerary UTRs removed
ADD REPLY
0
Entering edit mode

Michael Thank you so much for your effort to generate Agat output but for me "UTR isn't a concept in bacterial genome annotation" is something new. Could you provide me with some links or papers to validate it?

ADD REPLY
0
Entering edit mode

See the review (link) in my answer. It contains more references for further reading.

ADD REPLY
0
Entering edit mode

Juke34 Thank you so much for your response. But as I'm a student looking for free software, could you suggest another alternative to extract the sequences?

ADD REPLY
0
Entering edit mode

Agat is free open source. However, there likely isn't any software that does what you want out of the box.

ADD REPLY

Login before adding your answer.

Traffic: 2536 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6