How to change the chromosome names of the head line of fasta file?
1
0
Entering edit mode
2.3 years ago
Dan ▴ 180

I want to add "chr" to the chromosome names of the head line of fasta file:

>1 dna:chromosome chromosome:GRCm38:1:1:195471971:1 REF

to

>chr1 dna:chromosome chromosome:GRCm38:chr1:1:195471971:1 REF

I tried

cat Mus_musculus.GRCm38.dna.primary_assembly.fa | sed -e 's/^>\([0-9XY]\)/>chr\1/' -e 's/.*GRCm38:\([0-9XY]\):.*/chr\1/'

which can only change the first position, how should I change the second position?

Thanks

sed • 611 views
ADD COMMENT
3
Entering edit mode
2.3 years ago
ATpoint 85k

Since the patterns to replace are so unique you can literally just do the sledgehammer method:

awk '{gsub("^>",">chr");gsub(":GRCm38:",":GRCm38:chr");print}' your.fa
ADD COMMENT

Login before adding your answer.

Traffic: 1965 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6