Entering edit mode
7.3 years ago
biomagician
▴
410
Hi,
I have a GTF file with the following head:
head celegans.gtf
CHROMOSOME_I Coding_transcript exon 4119 4358 . - transcript_id "Transcript:Y74C9A.3.1"; gene_id "Gene:Y74C9A.3";
CHROMOSOME_I Coding_transcript exon 5195 5296 . - transcript_id "Transcript:Y74C9A.3.1"; gene_id "Gene:Y74C9A.3";
CHROMOSOME_I Coding_transcript exon 6037 6327 . - transcript_id "Transcript:Y74C9A.3.1"; gene_id "Gene:Y74C9A.3";
However, my FASTA file has the following chromosome names:
grep '>' celegans.fa
>I
>II
>III
>IV
>V
>X
>MtDNA
This discrepancy causes problems in downstream analyses. Does anyone know of a tool or way to rename the chromosome names in my GTF file to correspond to the chromosome names in the FASTA file?
Thanks.
Best, C.
Hi,
works but the command with the '-i' option gives the following error:
Does the '-i' mean 'in-place' so changes the file directly? I am going to try to redirect the output of the '-e' command to the file itself.
Oups, this erased the content of the file. So the 'return' of the 'sed -e' command is NULL?
Best, C.
yes, unfortunately so. Before posting, I tried with example data and worked: (I am on Ubuntu and sed v4.2.2). I guess you are on MacOS and sed -i issue is discussed here and work around is given at the end of the post.
Correct guess, thanks. It worked now. So you made use of the fact that the GTF file just had 'CHROMOSOME_' prepended to all my FASTA chromosome names, right? Do you mind explaining this: 's/CHROMOSOME_//g' ?
Correct. Sed syntax is s/old string /newstring/ (/ is a markup for before and after). g is for global replacement (entire file). Other wise only first match (old string) will be replaced. Entire substitution is in quotes. In above line, chromosome_ is old string and is replaced with no space in short it got removed.
Maybe you can consolidate your comments into an answer and I can accept it?
You could redirect to a new file and use that, no real need for -i