I am using GFF file for feature count to produce counts for RNA-Seq analysis and the organism is non-model organism, while calculating counts I am unable to get the proper counts and as the assembly is not good and the gff
#!genome-build RproC3
#!genome-version RproC3
#!genome-date 2015-04
#!genome-build-accession GCA_000181055.3
KQ034291 VectorBase gene 36335 45838 0 + 0 gene_id "RPRC000679";"
KQ034291 VectorBase transcript 36335 45838 0 + 0 gene_id "RPRC000679"; transcript_id "RPRC000679-RA";"
KQ034291 VectorBase exon 36335 36356 0 + 0 gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "1";"
KQ034291 VectorBase CDS 36335 36356 0 + 0 gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "1";"
KQ034291 VectorBase exon 40565 40684 0 + 0 gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "2";"
KQ034291 VectorBase CDS 40565 40684 0 + 2 gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "2";"
KQ034291 VectorBase exon 40763 40941 0 + 0 gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "3";"
KQ034291 VectorBase CDS 40763 40941 0 + 2 gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "3";"
KQ034291 VectorBase exon 45833 45838 0 + 0 gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "4";"
KQ034291 VectorBase CDS 45833 45835 0 + 0 gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "4";"
KQ034291 VectorBase stop_codon 45836 45838 0 + 0 gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "4";"
KQ034291 VectorBase gene 48738 55400 0 - 0 gene_id "RPRC003242";"
KQ034291 VectorBase transcript 48738 55400 0 - 0 gene_id "RPRC003242"; transcript_id "RPRC003242-RA";"
KQ034291 VectorBase exon 55216 55400 0 - 0 gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "1";"
KQ034291 VectorBase CDS 55216 55289 0 - 0 gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "1";"
KQ034291 VectorBase start_codon 55287 55289 0 - 0 gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "1";"
KQ034291 VectorBase exon 53297 53592 0 - 0 gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "2";"
KQ034291 VectorBase CDS 53297 53592 0 - 1 gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "2";"
KQ034291 VectorBase exon 52421 52605 0 - 0 gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "3";"
KQ034291 VectorBase CDS 52421 52605 0 - 2 gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "3";"
KQ034291 VectorBase exon 51858 51907 0 - 0 gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "4";"
KQ034291 VectorBase CDS 51858 51907 0 - 0 gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "4";"
KQ034291 VectorBase exon 51146 51248 0 - 0 gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "5";"
KQ034291 VectorBase CDS 51146 51248 0 - 1 gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "5";"
KQ034291 VectorBase exon 50189 50352 0 - 0 gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "6";"
KQ034291 VectorBase CDS 50189 50352 0 - 0 gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "6";"
KQ034291 VectorBase exon 48738 48965 0 - 0 gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "7";"
KQ034291 VectorBase CDS 48884 48965 0 - 1 gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "7";
"
where the first column id is same for all the genes and coz of which the count file contains the id "KQ034291" repeatedly and nothing else. However, I want to have the gtf/gff file with gene names like RPRC00679,RPRC003242 and so on , so that it shall help me to get unique gene counts , is there a way to do this?
First column should refer to
chromosome name
, which in your case seems to be KQ034291. I am not sure why you have (line numbers?) before that name. Where did you acquire this file from?I am also not sure but it was download from database. However I can get rid of it. But can I have the gene name instead of scaffold id in the first column?
You can but then file will not be in GTF/GFF format.
featureCounts
should understand thegene_id
attribute in the file you posted.YEs it will recognise at the sequences for alignment used will have the same gene_id.....so i want to know how to do that?
Only after you fix the first column (
chromosome names
need to match your alignment file). Have you looked at the manual/in-line help forfeatureCounts
? The two options you want to pay attention to areI am aware about these two options you have mentioned, I have edited the gtf file mentioned above, I am getting following warning while running featureCounts with no output file:
According to which 9th column has some problem, which is not the real case. As I also did cut-f 9 *.gtf and here is the output :
So I have no clue what is going wrong here , any idea??
Closing a post is not an appropriate action when a question has been answered (geneally mods use that action to close posts deemed inappropriate/duplicate etc). You should accept an answer (green check mark) (moved @Devon's post to an answer) to indicate this question has been answered.