read in the .csv downloaded from TeaCoN

Question

Tutorial:How to: make Camellia sinensis var. sinensis (black tea) custom annotation files for BINGO Cytoscape

1

Entering edit mode

3.6 years ago

Pratik ★ 1.1k

How to: make custom Camellia sinensis var. sinensis (black tea) annotation files for BINGO Cytoscape

BINGO wants a format like this for custom annotation files:

(species=Saccharomyces cerevisiae)(type=Biological Process)(curator=GO)

YAL001C = 0006384
YAL002W = 0045324
YAL002W = 0045324
YAL003W = 0006414
YAL004W = 0000004
YAL005C = 0006616
YAL005C = 0006457
YAL005C = 0000060
YAL007C = 0006888
YAL008W = 0000004

...

(See https://www.psb.ugent.be/cbd/papers/BiNGO/Customize.html)

BEGINNING OF TUTORIAL:

Go to assembly page: https://www.ncbi.nlm.nih.gov/assembly/GCA_004153795.2
Download and extract Feature_table file from genbank (see image below)

3.Now in terminal, parse out all tea gene IDs (the directory I am working in is ~/Desktop/biostars/:

cat ~/Desktop/biostars/GCA_004153795.2_AHAU_CSS_2_feature_table.txt | cut -f17 | tr -d '_' | awk '(NR>1)' | sort | uniq > ~/Desktop/biostars/geneids.txt

Next go to http://teacon.wchoda.com/GOEnrichment and copy and paste the contents of the geneids.txt, select Biological Process (you have to repeat for Molecular Function and Cellular Component using the same gene ids to make three separate files)

pvalue cut off : 1
padjustmethod: FDR (this might not matter, but just in case)
qvaluecutoff: 1000

(We just want all the GO annotations that's the purpose of this!)

Click 'Submit' and on the next page after it loads click 'Export Data'
Now in R use the following script to parse around the data to get it in the format that BINGO wants:

read in the .csv downloaded from TeaCoN

teaBP <- read.csv("~/Desktop/biostars/GO Enrichmnet - TeaCoN.csv", header = TRUE)

null out unnecessary columns

teaBP$Description <- NULL
teaBP$GeneRatio <- NULL
teaBP$BgRatio <- NULL
teaBP$pvalue <- NULL
teaBP$p.adjust <- NULL
teaBP$qvalue <- NULL
teaBP$Count <- NULL

split up genes in bunched value columns to individual columns

install.packages('splitstackshape')
library('splitstackshape')
teaBPsplit <- cSplit(teaBP, "geneID", sep=" ")

wide to long

install.packages('tidyr')
library(tidyr)
teaBPsplit.long<- pivot_longer(teaBPsplit, 2:388, names_to =  "colnames", values_to = "genes")

remove NAs

teaBPsplit.long.noNA<- teaBPsplit.long[!is.na(teaBPsplit.long$genes), ]

remove unnecessary column

teaBPsplit.long.noNA$colnames <- NULL

write your file

write.table(teaBPsplit.long.noNA, file = '~/Desktop/biostars/teaBPannotation.txt', col.names = F, row.names = F)

In terminal rearrange your data:

cat teaBPannotation.txt | tr -d '"' | awk  -F " " ' NR>1 { print $3 " " $2 }' > teaBPannotation_clean.txt

Lastly in textedit... replace all .1 GO: with = enter image description here

and also don't forget to add:

(species=Camellia sinensis)(type=Biological Process)(curator=GO) at the very top like this: enter image description here

and then select it in BINGO: enter image description here

NOTE: if you want to a similar file for Molecular Function and Cellular Component, you should repeat from the TeaCoN step, instead select your desired GO category and also change your header line on the final file (species=Camellia sinensis)(type=CHANGE TO GO CATEGORY)(curator=GO)

Here is the Biological Process annotation file that was generated from this tutorial: https://drive.google.com/file/d/1Gv6M6N_T1e00fTqd4qprNQJgdED_YSt4/view?usp=sharing

bingo cytoscape • 1.5k views

ADD COMMENT • link 3.6 years ago by Pratik ★ 1.1k