Entering edit mode
2.5 years ago
genomes_and_MGEs
▴
10
Hi everyone, I have a bunch of file entries named as below in file1.txt:
5275_AA_run719_GAGATTCC_S520_L004_R1_001.fastq.gz
5275_A_run720_ATTACTCG_S84_L001_R1_001.fastq.gz
5275_AB_run719_GAGATTCC_S521_L004_R1_001.fastq.gz
5275_B_run720_ATTACTCG_S85_L002_R1_001.fastq.gz
I would like to rename the first two columns (separated by _) of each filename, according to the file correspondence.txt:
5275_A MDF3
5275_B MDF6
5275_AA MCO6
5275_AB MCO7
If I run
while read n k; do sed -i "s/$n/$k/g" file1.txt ; done < correspondence.txt
this will rename files in a wrong way. For example, the
5275_AA_run719_GAGATTCC_S520_L004_R1_001.fastq.gz
file will be renamed to
MDF3A_run719_GAGATTCC_S520_L004_R1_001.fastq.gz
instead of
MCO6_run719_GAGATTCC_S520_L004_R1_001.fastq.gz
Is there a way to optimize the above code?
Thank you.
perhaps try this:
(expand the regex with a _ to make it more specific)
If that doesn't suffice, you can additionally try to sort your
correspondence.txt
by length, putting the longer patterns first:If you use this file then with
it should process the longest and thus hopefully most specific patterns first and already have replaced those before the more generic patterns are processed.