I am working with Camellia Sinensis or black tea species, consequently, I got few genes that are expressed differentially. I have its genome annotation file of this species but I can not get differentially gene's ENSEMBLE or ENTREZ gene id manually or even I tried PANTHER, DAVID online tool for gene id conversion but didn't become fruitful because of maybe an uncommon working species. Please help me.
Gene id is quite like following-
TEA_016967, TEA_010081, TEA_002547, TEA_015527, TEA_019823
My genome annotation file is in gtf file format I cannot extract ensemble id /entraz id please help me the gtf file is like the following -
SDRB02000004.1 Genbank gene 6018 10396 . + . gene_id "TEA_012962"; transcript_id ""; gbkey "Gene"; gene_biotype "protein_coding"; locus_tag "TEA_012962";
SDRB02000004.1 Genbank transcript 6018 10396 . + . gene_id "TEA_012962"; transcript_id "gnl|WGS:SDRB|TEA014503.1"; gbkey "mRNA"; locus_tag "TEA_012962";orig_protein_id "gnl|WGS:SDRB|TEA014503.1:cds_7"; orig_transcript_id "gnl|WGS:SDRB|TEA014503.1"; product "hypothetical protein"; transcript_biotype "mRNA";
SDRB02000004.1 Genbank exon 6018 6864 . + . gene_id "TEA_012962"; transcript_id "gnl|WGS:SDRB|TEA014503.1"; locus_tag "TEA_012962"; orig_protein_id "gnl|WGS:SDRB|TEA014503.1:cds_7"; orig_transcript_id "gnl|WGS:SDRB|TEA014503.1"; product "hypothetical protein"; transcript_biotype "mRNA"; exon_number "1";
SDRB02000004.1 Genbank exon 7548 7685 . + . gene_id "TEA_012962"; transcript_id "gnl|WGS:SDRB|TEA014503.1"; locus_tag "TEA_012962"; orig_protein_id "gnl|WGS:SDRB|TEA014503.1:cds_7"; orig_transcript_id "gnl|WGS:SDRB|TEA014503.1"; product "hypothetical protein"; transcript_biotype "mRNA"; exon_number "2";
SDRB02000004.1 Genbank exon 7802 7923 . + . gene_id "TEA_012962"; transcript_id "gnl|WGS:SDRB|TEA014503.1"; locus_tag "TEA_012962"; orig_protein_id "gnl|WGS:SDRB|TEA014503.1:cds_7"; orig_transcript_id "gnl|WGS:SDRB|TEA014503.1"; product "hypothetical protein"; transcript_biotype "mRNA"; exon_number "3";
Thank you for helping, I am facing a issue for GO enrichment analysis that i have approx 20 differentially expressed genes I want to analysis them through BINGO-cytoscape app but problem I'm facing The reference custom annotation file is not supporting which is in gtf file the gene set is provided which are as genebank id. significance level is choosen 0.05 Reference annotation file is choosen custom gtf file of Camellia sinensis assembly 2. Thank you!
Hey Abhisek,
I think the reference annotation file you are uploading might be wrong. I beleive you are uploading something you downloaded directly from here: https://www.ncbi.nlm.nih.gov/assembly/GCA_004153795.2
See a tutorial I wrote regarding the custom annotation file that you have to make:
How to: make Camellia sinensis var. sinensis (black tea) custom annotation files for BINGO Cytoscape
I provided a biological process custom annotation file on the bottom of the tutorial in a google drive link. If you want to use Molecular Function and Cellular Component Categories you will have to repeat the tutorial from the TeaCoN step.
Hope this helps and good luck!
Thank you again for your valuable response but I want to say you the following starting command is not working I changed it as my working directory where feature table is present-
cat ~/Desktop/biostars/GCA_004153795.2_AHAU_CSS_2_feature_table.txt | cut -f17 | tr -d '_' | awk '(NR>1)' | sort | uniq > ~/Desktop/biostars/geneids.txt
in case of me -
cat ~/home/abhisek/Documents/workingdir/GCA_004153795.2_AHAU_CSS_2_feature_table.txt | cut -f17 | tr -d '_' | awk '(NR>1)' | sort | uniq > ~/home/abhisek/Documents/workingdir/geneids.txt
The error is following
bash: /home/abhisek/home/abhisek/Documents/workingdir/geneids.txt: No such file or directory cat: /home/abhisek/home/abhisek/Documents/workingdir/GCA_004153795.2_AHAU_CSS_2_feature_table.txt: No such file or directory
Please let me know why it is not working in my linux pc?
Hy abhisek, you typed
however because you typed
~
before your file path.It repeated the file path shortcut.
In other words, in your case
~
=/home/abhisek
This is shown in your error:
where the file path repeats
/home/abihsek
twice before listing the file path.Your solution would be to either remove
~
or remove/home/abhisek
because they mean the same thing.Hope this helps!
Hello Pratik Sir, I have followed your tutorial to get the result in your code I think minor mistakes happened because I do not get the expected result cytoscape. Once just follow my Gene id and annotation file what annotation file you made their gene id associated to GO ID is not matched to my inputted Gene id.
With explanation - my inputted gene id is following TEA_016967, TEA_010081, TEA_002547, TEA_015527, TEA_019823
But in tutorial, u made the GO ID associated to Gene id is following - TEA002763 = 0000028 TEA006828 = 0000028 TEA006848 = 0000028 TEA008332 = 0000028 TEA020472 = 0000028 TEA024267 = 0000028 TEA027695 = 0000028
That's when my input not showing any result. Thanks Pratik Sir.
I think you might just simply have to remove the underscores
_
You could do that manually or in terminal like so:
and then copy and paste this input.
So instead of gene names like TEA_016967, TEA_010081, TEA_002547, TEA_015527, TEA_019823
you would do TEA016967, TEA010081, so on and so forth.
Respected Sir, You helped a lot. meanwhile, I find the actual problem, These are the common name of the genes - TEA_016967, TEA_010081, TEA_002547, TEA_015527, TEA_019823 but those are not associated to GO ID but TEA016967, TEA010081 these are are linked to GO so follow my given annotation data if you write a small script that takes common names and provides GO linked associated ID as output it will be very helpful. Thanks again
You're welcome. Remember to pay it forward by helping others : )
I think I understand what you want. You want to input genes with the underscore like below into BINGO:
instead of changing them to this for your BINGO input:
You can do do this easily through a basic text editor. In your gene annotation file that looks like this:
or that looks like this:
You can use the
Replace All
orFind & Replace
.You can do
Replace All
:TEA
toTEA_