Entering edit mode
8.7 years ago
natsterbug
▴
10
After running TopHat2/2.1.0 on RNA-seq SE 50bp reads from S.tuberosum, I am now attempting to count reads mapping to each feature using htseq-count. Using the following command:
htseq-count -m intersection-nonempty --format=bam \
tophat_Kalkaska_control/tophat_K10C/accepted_hits.bam \
PGSC_DM_V403_genes_strand_filtered.gff
I receive the following error message:
Error occured when processing GFF file (line 3 of file PGSC_DM_V403_genes_strand_filtered.gff):
Feature PGSC0003DME400103709 does not contain a 'gene_id' attribute
[Exception type: ValueError, raised in count.py:53]
My understanding is that htseq is expecting a gtf
file rather than the gff
file I supplied. I would like to convert my gff
file to gtf
or modify the 9th column of the gff
. A sample of my gff
file is below:
##gff-version 3
ST4.03ch01 Cufflinks mRNA 152322 153489 . - . ID=PGSC0003DMT400039136;Parent=PGSC0003DMG400015133;Source_id=RNASEQ26.809.0;Mapping_depth=16.192011;Class=4;name="Defensin"
ST4.03ch01 Cufflinks exon 153389 153489 . - . ID=PGSC0003DME400103709;Parent=PGSC0003DMT400039136
ST4.03ch01 Cufflinks exon 152322 152593 . - . ID=PGSC0003DME400103710;Parent=PGSC0003DMT400039136
ST4.03ch01 Cufflinks intron 152594 153388 . - . ID=PGSC0003DMI400065839;Parent=PGSC0003DMT400039136
ST4.03ch01 BestORF CDS 152418 152576 . - 0 ID=PGSC0003DMC400026563;Parent=PGSC0003DMT400039136;name="Defensin"
ST4.03ch01 GLEAN mRNA 160499 160663 . - . ID=PGSC0003DMT400039133;Parent=PGSC0003DMG400015132;Source_id=PGSC0003DMG000019750;Class=2;name="Defensin"
ST4.03ch01 Cufflinks mRNA 160379 161885 . - . ID=PGSC0003DMT400039134;Parent=PGSC0003DMG400015132;Source_id=RNASEQ26.803.0;Mapping_depth=35.840147;Class=2;name="Defensin"
ST4.03ch01 Cufflinks exon 161722 161885 . - . ID=PGSC0003DME400103705;Parent=PGSC0003DMT400039134
ST4.03ch01 GLEAN exon 160499 160663 . - . ID=PGSC0003DME400103707;Parent=PGSC0003DMT400039133
Is gffread PGSC_DM_V403_genes_strand_filtered.gff -T -o PGSC_DM_V403_genes_strand_filtered.gtf
the appropriate course of action? Thanks, Natalie
I don't recall all the features to
gffread
but it sounds about right. What do you get as a result?I apologize for the extremely tardy response. Below is the output:
ST4.03ch00 GLEAN exon 63411 63498 . + . transcript_id "PGSC0003DMT400089830"; gene_id "PGSC0003DMG400039401"; ST4.03ch00 GLEAN exon 66359 66816 . + . transcript_id "PGSC0003DMT400089830"; gene_id "PGSC0003DMG400039401"; ST4.03ch00 GLEAN CDS 63411 63498 . + 0 transcript_id "PGSC0003DMT400089830"; gene_id "PGSC0003DMG400039401"; ST4.03ch00 GLEAN CDS 66359 66816 . + 2 transcript_id "PGSC0003DMT400089830"; gene_id "PGSC0003DMG400039401"; ST4.03ch00 GLEAN exon 70051 70281 . + . transcript_id "PGSC0003DMT400036367"; gene_id "PGSC0003DMG400013996"; ST4.03ch00 GLEAN exon 72021 73032 . + . transcript_id "PGSC0003DMT400036367"; gene_id "PGSC0003DMG400013996"; ST4.03ch00 GLEAN exon 73103 73227 . + . transcript_id "PGSC0003DMT400036367"; gene_id "PGSC0003DMG400013996"; ST4.03ch00 GLEAN CDS 70051 70281 . + 0 transcript_id "PGSC0003DMT400036367"; gene_id "PGSC0003DMG400013996"; ST4.03ch00 GLEAN CDS 72021 73032 . + 0 transcript_id "PGSC0003DMT400036367"; gene_id "PGSC0003DMG400013996"; ST4.03ch00 GLEAN CDS 73103 73227 . + 2 transcript_id "PGSC0003DMT400036367"; gene_id "PGSC0003DMG400013996";