Entering edit mode
8 months ago
pablo
▴
310
Hi,
I would like to replace my FASTA headers according to a matching file.
I have :
head my.fasta
>CM020909.1:14117
aTTTTTGTCCCCAatattaggccctatgttctcacatttcacaatttttt[C/T]ccccaaaattaggccctatgttcccacatttcaagatttattttttccaa
>CM020909.1:148127
TTACTTTGTAGTAGCTTTTACTTTGTAGCTAGAGGCTTGGCTGTGCTGTT[T/A]GGGCTTTTACTGTAGTGGCTTGGGTTGGTAGTGACTTTGAAGGAGGTTAA
>CM020909.1:254785
CGTCCATCTTCTCCAGAGCTCTTTAAAGCCAAAGCGTTTTGGGGGACAGC[A/T]GCACAAAGAGCCAATCAAACGCCACAAAAGGCAGAGAACCGGACACCTGG
>CM020909.1:362180
ccaaaaatgatcGCCGCTGGtcgggagggggaggggacgggggAGGTGGG[T/G]AATTTGGCTTAAACACACAATTCAAAGAGGGGAACGTTGTTAAACAAACG
>CM020909.1:469928
agggtaactaatcattctttacccttctgaaaaagtgtaactacccttct[G/C]aaaaacagtaactaatcattaactacctatttttttgtgtaccttattat
And the corresponding file :
head correspondances.txt
CM020909.1 CHR1
CM020910.1 CHR2
CM020911.1 CHR3
CM020912.1 CHR4
CM020913.1 CHR5
CM020914.1 CHR6
CM020915.1 CHR7
CM020916.1 CHR8
CM020917.1 CHR9
CM020918.1 CHR10
I use this but does not work :
seqkit replace -k correspondances.txt -p "^(.+?) (.+)$" -r "{kv}:\$2" my.fasta
I need to keep the position after the " : " , to get something like that for the first sequence : >CHR1:14117
Any help?
Thanks a lot, that was pretty easy actually. I take this opportunity to ask you another thing : in my FASTA file, I have some headers which do not match any line of my correspondances.txt . With the seqkit replace command , that creates for these headers something like this ">:643" (rather than >CM020933.1:643) . Is there an option to do not edit those headers ?
add
-K
.the logic is (code):
Perfect, thanks a lot!