Entering edit mode
5.2 years ago
Kumar
▴
120
I would like to extract promoter sequences from a draft plant genome. I found many web tools for the same, but those are accepting small file as a input sequence (maximum 5 mb). I have 350 MB sized plant draft genome, therefore please suggest me any off-line tool for the prediction of promoter sequences.
Thanks in advance.
I faced the same problem around year ago working with Nicotiana tabacum genome (annotated scaffolds). At that time I did not find the solution so just tried to write my own bash script that utilizes the common software like samtools and bed tools and outputs multi fasta with promoter sequences. Here is it: https://github.com/RimGubaev/extract_promoters Hope this could help!
Looking at that code, it is mainly extracting flanking sites from annotated genes. While this is broadly addressing the goal, this should not be confused with finding bona fide promoter sequences, merely the regions where they would actually be found.
I say this just in case someone in future comes to this thread and runs that code as a 'black box'.
Yeah, that's correct, thanks!
Thank you @rimgubaev
Many thanks, dear colleagues!
Do you just want 1kb upstream of genes? That's one line of bedtools.
yes indeed @Asaf, could you please elaborate it.
Take a look at bedtools flank