How to extract all promoter regions in multi-fasta format from genome using GFF?
2
1
Entering edit mode
6.8 years ago
rimgubaev ▴ 340

Hi Everyone,

How can I extract promoter sequences (ca. 1000bp upstream TSS) in multi-fasta format from genome (also multi-fasta with scaffolds) using information from corresponding GFF file? I've already tried to use GFF-Ex tool, however it didn't help (finished with errors). It is tobacco genome (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/715/135/GCF_000715135.1_Ntab-TN90/).

Does anyone know some other tools for this?

Thanks,

promoter genome gff fasta • 5.3k views
ADD COMMENT
5
Entering edit mode
6.4 years ago
rimgubaev ▴ 340

Finally, I've solved this problem by combining samtools, bedtools as well as custom R script. The pipline placed into bash script is available here.

ADD COMMENT
1
Entering edit mode
6.8 years ago
shoujun.gu ▴ 380
  1. extract the gene id from GFF file
  2. fetch the promoter sequence from BioMart by using the gene id you extracted
ADD COMMENT
0
Entering edit mode

BioMart on Ensembl only appears to have Nicotiana attenuata genome but not the one OP likely wants.

ADD REPLY
0
Entering edit mode

genomax is right there are no Nicotiana tabacum data on Ensembl.

ADD REPLY

Login before adding your answer.

Traffic: 1983 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6