Hi all,
this is my first time posting on BioStars! I thought I'd easily find a bash one-liner to take care of this task but I've been searching with no luck.
I have a fasta, samples.fasta
, that looks like this:
>sample01/contig002
ATCG
>sample02/contig001
GCTA
>sample11/contig003
CAGT
I have a text file, sample_key.txt
, that has samples (always format 'sampleXX') paired with isolate names. Isolate names have a variety of formats, but none of them contain spaces. sample_key.txt
looks like this:
sample01 AAA
sample02 def456
sample03 F7
.....
sample11 H-10
I'm trying to do two things: 1) replace the sample name with the isolate name (ie the value in the key file) and 2) replace the '/' in the original header with a '_'. I want to keep the second part of the original header, the contig number. My ideal output, looks like this:
>AAA_contig002
ATCG
>def456_contig001
GCTA
>H-10_contig003
CAGT
I've tried seqkit replace
, but it doesnt seem to work unless the keys match the existing headers exactly, which wont be the case here because sample_key.txt
only contains part of each header. Unless you have a really simple way to do it, I dont think simply making a new key file is a useful option.
Thanks!!
FASTA header editing is a widely discussed topic on the forum. Have you searched the forum for existing posts?
Yes, but for some reason none of the suggested answers in the posts I found already worked, or they were exclueively for cases with exact matching. which was suprising to me bc I usually find awesome solutions on BioStars within 10 mins or so.... I figured out a sort of janky work around but after like an hour of trying to get old solutions to work I was ready to give up. will post said janky work around in a bit lol