Entering edit mode
3.0 years ago
Space_Life
▴
50
Hi, I have hundreds of prokka annotated gff files. I want to extract ID, product, Uniprot ID and gene name from every gff file. All these information are in the last column of the file when I open it in an excel sheet. I tried converting them into csv and then extracting information, however, it takes time saving files one by one in csv format. Also, csv file coverts it into 9 columns only. The last column has all the required information in one cell. I am new to using R or Python.
- Extract the above mentioned information from each gff files. ( Great if I can do bulk operation on all the files together)
- Create csv file with the extracted information
- bind all files into one long csv file
Kindly suggest me with possible code that could be used to do this. Thank you.
Incase you are using unix, use awk command to extract specific columns. In your case I guess u want to extract the information of only the genes (off file also have the info from exon, intron etc.). This information you will find in third column
here,