How can I extract 3'UTR of bacteria (Pseudomonas aeruginosa) without using R? I have Gene_Ids extracted from NCBI.
How can I extract 3'UTR of bacteria (Pseudomonas aeruginosa) without using R? I have Gene_Ids extracted from NCBI.
Your best bet is to use experimental data. This cannot be done from automatic genome annotations in databases, mostly due to historical reasons and because bacterial transcription machinery works differently from the eukaryotic one. Unlike in eukaryotes, bacterial 3'-UTR's as a regulatory entity have only recently moved into focus and little is known about their regulatory role (reviewed by Menendez-Gil & Toledo-Arana).
Prokaryotic genome annotation tools mostly feature a simplistic gene model of "1 gene == 1 CDS," often without annotating transcripts and other features. Therefore, extracting UTR candidates from public annotation will not work out of the box. The review paper lists some methods and studies concerned with detecting bacterial UTRs.
If you only have automatic annotation data, your best bet may be to detect rho-dependent termination signals and investigate the region between CDS end the signal. The result may be noisy.
With AGAT.
agat_sp_extract_sequences.pl --gff infile.gff --fasta infile.fasta -t 3'-utr
or three_prime_utr
it depends how the feature is defined in your file.
To focus only on the chosen gene ids, prior, you should filter your infile.gff with agat_sp_filter_feature_from_keep_list.pl
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
This won't work. UTR isn't a concept in bacterial genome annotation (which doesn't mean they do not exist, they are just not regularly annotated by annotation pipelines):
Also, there are no "implicit UTRs" where CDS start end deviates from gene start/end:
From Agat output:
Michael Thank you so much for your effort to generate Agat output but for me "UTR isn't a concept in bacterial genome annotation" is something new. Could you provide me with some links or papers to validate it?
See the review (link) in my answer. It contains more references for further reading.
Juke34 Thank you so much for your response. But as I'm a student looking for free software, could you suggest another alternative to extract the sequences?
Agat is free open source. However, there likely isn't any software that does what you want out of the box.