Remove text flanking .. on fasta-headers
2
0
Entering edit mode
3.8 years ago

I have file like below :

>1 Scaffold001_3a:1-2410 TCONS_00023122 - XLOC_013800 CUFF.2.1
TCTGGAGTCAACTGCAGTGTTCGAATATAATACATTGCGAACAAAGTTGATTGAAGAATTTGCTAA
>2 Scaffold001_3a:26352-26946 TCONS_00019328 + XLOC_011360 CUFF.1.1
TTAATTATCGATATATTGAGTCATTGACGTTGTTCCACCTCTATATTGTACATACTTATATATATTAATAT

and I want to remove numbering and want file like this:

>Scaffold001_3a:1-2410 TCONS_00023122 - XLOC_013800 CUFF.2.1
TCTGGAGTCAACTGCAGTGTTCGAATATAATACATTGCGAACAAAGTTGATTGAAGAATTTGCTAA
>Scaffold001_3a:26352-26946 TCONS_00019328 + XLOC_011360 CUFF.1.1
TTAATTATCGATATATTGAGTCATTGACGTTGTTCCACCTCTATATTGTACATACTTATATATATTAATAT

Please let me know the command for this.

Thank you.

alignment fasta • 591 views
ADD COMMENT
0
Entering edit mode
3.8 years ago
5heikki 11k
 awk 'BEGIN{FS=" "}{if($0~/^>/){$1=""; print ">"substr($0,2)} else{print $0}}' input.fna > ouput.fna
ADD COMMENT
0
Entering edit mode
3.8 years ago

Please post what you have tried.

With sed:

$ sed '/^>/ s/>[0-9]\+\W/>/' test.fa     

>Scaffold001_3a:1-2410 TCONS_00023122 - XLOC_013800 CUFF.2.1
TCTGGAGTCAACTGCAGTGTTCGAATATAATACATTGCGAACAAAGTTGATTGAAGAATTTGCTAA
>Scaffold001_3a:26352-26946 TCONS_00019328 + XLOC_011360 CUFF.1.1
TTAATTATCGATATATTGAGTCATTGACGTTGTTCCACCTCTATATTGTACATACTTATATATATTAATAT

with awk:

$ awk -v OFS="\n" '/^>/ {getline seq}{sub (/^>[0-9]+ /,">",$0); print $0,seq}' test.fa  

>Scaffold001_3a:1-2410 TCONS_00023122 - XLOC_013800 CUFF.2.1
TCTGGAGTCAACTGCAGTGTTCGAATATAATACATTGCGAACAAAGTTGATTGAAGAATTTGCTAA
>Scaffold001_3a:26352-26946 TCONS_00019328 + XLOC_011360 CUFF.1.1
TTAATTATCGATATATTGAGTCATTGACGTTGTTCCACCTCTATATTGTACATACTTATATATATTAATAT
ADD COMMENT

Login before adding your answer.

Traffic: 2472 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6