Hello,
I have multiple fasta sequences that are like this:
>2p__scaffold_2__5799__6580__-__778568__0.00__0.00
GCTGGCGACGGATCTAGGCTCAGCGCAGAAGCAACTGAGAGTCGGCGATGAGCAGCCGGA
GCTGGCGACGGATCTAGGCTCAGCGCAGAAGCAA
>2p__scaffold_2__5799__6580__+__778569__0.00__0.00
GCTGGCGACGGATCTAGGCTCAGCGCAGAAGCAACTGAGAGTCGGCGATGAGC
>1p__scaffold_2__11235__11438__-__830827__0.00__0.00
GCTGGCGACGGATCTAGGCTCAGCGCAGAAGCAACTGAGAGTCGGCGATGAGCAGCCGGA
GCTTCAATCCAGGGGATCGAGGAGATCCAAAGCAGCAGAAGCGGCTCGACGATGGTGAGG
ATTCGGGATCGGATTCAGCGCTCGTCGGGACTGG
>1p__scaffold_2__33129__34129__+__811706__0.00__0.00
GCTGGCGACGGATCTA
And I want to keep just the "> + ID" (numbers after __+/-__
and before __0.00_0.00)
So I expect an output like this:
>778568
GCTGGCGACGGATCTAGGCTCAGCGCAGAAGCAACTGAGAGTCGGCGATGAGCAGCCGGA
GCTGGCGACGGATCTAGGCTCAGCGCAGAAGCAA
>778569
GCTGGCGACGGATCTAGGCTCAGCGCAGAAGCAACTGAGAGTCGGCGATGAGC
I searched for it and tried this:
sed 's@.*__-__@@' input.fa > output.fa
That removed __-__
and everything before it, including the ">" that I wanted to keep.
I also tried this to remove everything between ">" and __-__
sed -e 's/\>//' -e 's/\__-__.*//' input.fa > output.fa
But this removed everything after __-__
And this, that removed __0.00_0.00
sed 's/__0.00.*$//' input.fa > output.fa
Thank you for your help.
Now THIS is how you write a "please help me with fasta headers" question!