Hello! I'm trying to figure out how to extract the accession numbers from the headers. (about 120 headers) I have to use sed and can't seem to figure it out. Here is a sample of what my file looks like:
>Ref.49_cpx.GM.03.N26677.HQ385479
ATGAGAGTGATGGAGACATGGATG-------------ATTTGCAAAATTG
G------TGG---------------------------AGAGGGGGTCTC
I need the part after the last period in the header. So the "HQ385479" part. Thanks in advance for the help!
Do you want to keep the rest of the alignments intact? I assume so but please clarify.
Edit: Looks like you want to keep just the accessions based on a response below.
You could do (if all accession lines start with Ref)
grep "^>Ref" input.txt | sed 's/^.*\.//g' > accession
Ah yes completely forgot I could use grep first. Thanks.
Not necessary to use grep. input (copy/pasted the first sequence and changed the id at the end, as second sequence):
output:
Thank you so much for this!