Plastid's metagene generate Warning
0
0
Entering edit mode
7.7 years ago
jlncrnt • 0

Hi,

I have a question about plastid python package. When I follow the tutorial, one of the step for preparing the datas include the generation of a "windows" file to be able to make a matagene analysis on ribosome profiling data. The command "metagene generate" ask for an annotation_file in GTF2 format. As previously explained in the tutorial, it's possible to generate it from within plastid with the following command:

reformat_transcripts --annotation_files yeast.gff --annotation_format GFF3 --sorted --output_format GTF2 yeast.gtf2

Then we use this plastid-generated GTF2 file as annotation file for the metagene window-file generation command :

metagene generate --annotation_files yeast.gtf2 --sorted --mask_annotation_files yeast.bb --mask_annotation_format BigBed --downstream yeast_windows

The file is created but I get the following warning :

DataWarning
All maximal spanning windows lack flanks upstream of reference landmark. This
occurs e.g. for start codons when annotation files don't contain UTR data.
Please check your annotation file.
in /path/to/pyscript/metagene.py, line 709:

707     if (df["alignment_offset"] == flank_upstream).all():
708         warnings.warn("All maximal spanning windows lack flanks upstream of reference landmark. This occurs e.g. for start codons when annotation files don't contain UTR data. Please check your annotation file.",
709                       DataWarning)
710         
711     # N.b. This warning will only be invoked for zero-length landmarks

It says that my "annotation files doesn't contain UTR data".

The .gff file comes from yeastgenome's last genome release (31-Jan-2015 15:11) .zip file. The generated .gtf2 file contains the following features (inspected with R) :

> handleGTF <- import("saccharomyces_cerevisiae_R64-2-1_20150113.gtf2","gtf")
> levels(handleGTF$type)

[1] "exon"        "CDS"         "start_codon" "stop_codon"

But when I inspect the levels of the original .gff file, I get the following :

> handleGFF <- import("saccharomyces_cerevisiae_R64-2-1_20150113.gff","gff")
> levels(handleGFF$type)

> levels(handle$type)
 [1] "chromosome"                         "telomere"                          
 [3] "X_element"                          "X_element_combinatorial_repeat"    
 [5] "telomeric_repeat"                   "gene"                              
 [7] "CDS"                                "mRNA"                              
 [9] "ARS"                                "long_terminal_repeat"              
[11] "region"                             "ARS_consensus_sequence"            
[13] "intron"                             "ncRNA_gene"                        
[15] "noncoding_exon"                     "tRNA_gene"                         
[17] "snoRNA_gene"                        "centromere"                        
[19] "centromere_DNA_Element_I"           "centromere_DNA_Element_II"         
[21] "centromere_DNA_Element_III"         "LTR_retrotransposon"               
[23] "transposable_element_gene"          "pseudogene"                        
[25] "Y_prime_element"                    "plus_1_translational_frameshift"   
[27] "five_prime_UTR_intron"              "telomerase_RNA_gene"               
[29] "matrix_attachment_site"             "snRNA_gene"                        
[31] "silent_mating_type_cassette_array"  "W_region"                          
[33] "X_region"                           "Y_region"                          
[35] "Z1_region"                          "Z2_region"                         
[37] "mating_type_region"                 "intein_encoding_region"            
[39] "blocked_reading_frame"              "rRNA_gene"                         
[41] "external_transcribed_spacer_region" "internal_transcribed_spacer_region"
[43] "non_transcribed_region"             "origin_of_replication"

Why can't plastid see the UTR regions ? Is my original GFF lacking the info ? Or do I have to put the UTR regions in the GTF2 file myself ?

If anyone has an experience with plastid package, I'll be glad to have any helping information or suggestion

Thank you

plastid metagene RNA-Seq ribosome-profiling • 2.0k views
ADD COMMENT
0
Entering edit mode

Anwsering my own post: this is a known issue. See this Github issue.

ADD REPLY

Login before adding your answer.

Traffic: 1502 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6