Entering edit mode
3.2 years ago
devhimd
▴
10
I have a GFF file of the UniProt I want to extract the lipidation, modified residue and its modifications, and the residue number on how to do that?
I want only the lipidation, modified residue, and its modifications, residue number in a CSV file
Is there any programming code or awk script to do that?
I think it would be better if you post a sample of the gff file and what you want to extract from it
This is the GFF file I want to extract only lipidation, nitrocysteine, and cysteine thioester from this file. I want to make a CSV file where it consists of the modification in one column and the residue numbers of the respective modification in another column. I don't want the other information like the evidence and the ECO etc...,.
So how should I proceed to write a program, awk script, or Perl script?
I suppose here you are referring to this gff file : https://www.uniprot.org/uniprot/P01112.gff
I am not a regular user of awk or Perl, but I am using R regularly so I can suggest a simple R script:
This is how you read gff:
This is how you can subset the "table" of the gff file for column "type" which contains "lipidation", "nitrocysteine" and "cysteine" entries and also have its respective residue number:
can you write using python?
I have been out of touch with python, but I am sure you can use similar logic (possibly even package) with python