Entering edit mode
3.9 years ago
dieunelderilus
▴
10
I have a gtf file as follow:
KB705106 VEuPathDB exon 3645 3767 0 - . gene_id ""; transcript_id "AARA010197-RA";
KB705106 VEuPathDB CDS 3645 3767 0 - 2 gene_id ""; transcript_id "AARA010197-RA";
KB705106 VEuPathDB exon 3975 4065 0 - . gene_id ""; transcript_id "AARA010198-RA";
I want to copy the first 10 characters of the gene transcript id and paste it to the corresponding gene id as follow:
KB705106 VEuPathDB exon 3645 3767 0 - . gene_id "AARA010197"; transcript_id "AARA010197-RA";
KB705106 VEuPathDB CDS 3645 3767 0 - 2 gene_id "AARA010197"; transcript_id "AARA010197-RA";
KB705106 VEuPathDB exon 3975 4065 0 - . gene_id "AARA010198"; transcript_id "AARA010198-RA";
Please, what is the easiest way to do this?
Thank you. ~DD
There are many different ways to parse and reformat text files. The easiest for you will depend on the scripting language you are most familiar with. For instance, I would personally use R (with the
read.table()
,sapply()
andstrsplit()
functions), but there are also good options in python/perl, and the most efficient way would probably be in bash/awk. What do you prefer ?