Hi all,
I am trying to count the number of reads that map to each gene using FeatureCounts. (RNA-Seq PE, linux)
my input;
- GFF. file generated using Prokka
- GTF.file generated by NCBI annotation
- Sorted.bam files generated by bowtie2 and samtools.
When I used gtf.file generated by NCBI, featurecounts run without any issue, however, I am interested in PROKKA.gff because it showed more comprehensive features and a higher mapping rate (compared to NCBI.gtf). So, when I used prokke.gff file, I received this error;
ERROR: no features were loaded in format GTF. The annotation format can be specified by the '-F' option, and the required feature type can be specified by the '-t' option..
Part of the content of prokka.gff file as below;
##sequence-region JAFJXZ010000052.1 1 250
##sequence-region JAFJXZ010000053.1 1 48755
##sequence-region JAFJXZ010000054.1 1 255
##sequence-region JAFJXZ010000055.1 1 465
##sequence-region JAFJXZ010000056.1 1 355
##sequence-region JAFJXZ010000008.1 1 618
##sequence-region JAFJXZ010000057.1 1 255
##sequence-region JAFJXZ010000058.1 1 271
##sequence-region JAFJXZ010000059.1 1 223
##sequence-region JAFJXZ010000009.1 1 354
JAFJXZ010000010.1 Prodigal:002006 CDS 3477 3851 . - 0 ID=GOHBADNI_00001;inference=ab initio prediction:Prodigal:002006;locus_tag=GOHBADNI_00001;product=hypothetical protein
JAFJXZ010000012.1 Prodigal:002006 CDS 712 1704 . + 0 ID=GOHBADNI_00002;eC_number=2.3.1.180;Name=fabHB;db_xref=COG:COG0332;gene=fabHB;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:007600;locustag=GOHBADNI_00002;product=3-oxoacylqacyl-carrier-protein] synthase 3 protein 2
JAFJXZ010000012.1 Prodigal:002006 CDS 1834 3969 . + 0 ID=GOHBADNI_00003;eC_number=5.6.2.1;Name=topB 1;db_xref=COG:COG0550;gene=topB 1;inference=ab initio prediction:Prodigal:002006,similar to AA
I checked several threads discussing this issue and couldn't find an appropriate answer so far.
I tried these steps to solve the error;
used stringtie to generate merge gtf.file (prokka.gff as input). However, I counted the same error when I used the merged.gtf file in featureCounts instead of ncbi.gtf or PROKKA.gff.
converted gff to gtf or gff3 > gffread resulted 0 bytes files, so I couldn't try them
Kindly provide guidance on resolving this error. Appreciate your help greatly.
Thank you Mensur,
Unfortunately, I could not install AGAT into my linux, since it is not local and doesn't support conda nor docker,also other methods were not successful; Using Singularity Old school - Manually
Is there any alternative approach, for example, can I use R to do such task
Thanks
I can't remember how I installed it - it might have been old school - but there are two ways listed that support
conda
:The other is within old school way.
There are many other GFF -> GTF converters, but I don't know if any of them work for this particular purpose.
Actually, I cannot use conda on my non-local linux, also GFF -> GTF did not work.
Thank you,
Could you explain why you can't use conda? It's possible to install conda as a non-root user, and I've done so successfully on different HPC systems in the past.
As mention on the supercomputer server website;
conda installs binaries which are not optimized for the processor architecture on our clusters.
As these are perl scripts, there should be no issue with optimized binaries. Besides, even installing non-optimal binaries may get the job done faster than searching for alternative solutions. Finally, running
featureCounts
is not a demanding task, and should be doable on any personal Linux computer.Can we conduct this task using R installed on my local?