How to extract promoter sequences from rice transcriptome.gtf file ?
2
0
Entering edit mode
3.2 years ago
isha.lily20 ▴ 10

Hello researchers,

I am stuck in my project and require an effective solution

  1. How to extract promoter sequences from rice transcriptome.gtf file?
  2. How to extract promoter sequences 2kb from rice transcriptome.gtf file?
  3. How to extract promoter sequences downstream 2kb from rice transcriptome.gtf file?

Thank you

gtf • 2.5k views
ADD COMMENT
2
Entering edit mode

Do you have chromosome/scaffold lengths for rice? and post lines for which you would need upstream and downstream elements. You would need each chromosome/scaffold length, genome sequence and bedtools. use functions flank and getfasta from bedtools.

ADD REPLY
2
Entering edit mode

Basically promoter means up stream of the TSS, and TSS is the annotated start of each transcript. Hence, get start coordinates per transcript (it is the "end" coordinate of in the - strand), and then get 500bp upstream which is like the default for promoter approximation. Then use mentioned tools to get fasta sequences.

ADD REPLY
1
Entering edit mode
3.1 years ago
D. Puthier ▴ 350

Hi,

You may try the CLI interface (gtftk) of Python GTF toolkit. Although it may be slower it offers additional arguments to transfert transcript informations into the 4th colum.

gtftk get_example | gtftk select_by_key -k feature -v transcript | gtftk get_5p_3p_coords -n gene_id,transcript_id  -m promoter -s '|'

Best

Disclosure: I'm the pygtftk developer.

ADD COMMENT
1
Entering edit mode
3.1 years ago
Juke34 9.0k

You might find usefull information here https://github.com/NBISweden/AGAT/issues/89 and here Extracting genomic feature sequences from GTF/GFF files with AGAT

To get the 2kb upstream region from tss with AGAT:
agat_sp_extract_sequences.pl --gff input.gff --fasta input.fasta -t transcript --eo --up "2000"

To get the 2kb downstreamregion from tss with AGAT:
agat_sp_extract_sequences.pl --gff input.gff --fasta input.fasta -t transcript --eo --down "2000"

*replace transcript by mRNA depending how it is called in te 3rd column of your file.

ADD COMMENT

Login before adding your answer.

Traffic: 2361 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6