how to rename or time bunch of file names in fasta file
3
0
Entering edit mode
20 months ago

Hi everyone

I have almost more than 1000 MAGS with the following file names.

GCF_029211165.1_ASM2921116v1_genomic.fa

I want them to make it tidy before my analysis, something like this GCF_029211165.

Could someone please share a script for doing this on a bunch of files?

Many thanks Venkat

fasta • 1.2k views
ADD COMMENT
0
Entering edit mode

Thank you all for the suggestions

ADD REPLY
2
Entering edit mode
20 months ago

something like:

find dir1 dir2 -type f -name "*.fa" |\
   awk '{printf("mv \"%s\" \"%s\"\n",$0,$0);}' |\
   sed 's/\/\([^\/\.]*\)\.[^\/]*$/\/\1\.fa"/' > check_this_script_then_execute.sh
ADD COMMENT
0
Entering edit mode

Thank you for your quick response, Pierre. Could you please tell me what dir1 dir2 represent in the script? I have used it, but it didn't work.

thanks and regards Venkat

ADD REPLY
0
Entering edit mode

If your files are in one directory (where you are running this command from) then you do not need to use dir1 dir2 part.

ADD REPLY
2
Entering edit mode
20 months ago

GNU parallel is also an option.

parallel -j1 --dry-run mv {} {= s/\\..+/.fa/ =} ::: *.fa

Remove --dry-run if the commands look good.

You may need to modify the directory structure or regex depending on your actual directory structure and file naming.

ADD COMMENT
2
Entering edit mode
20 months ago

Also try brename. It's handly and safe, being able to check potential conflicts and errors:

Removing anything after the first dot (-p '\..+' -r "") but without changing the file extension (-e). -R is for rename recursively (including sub directories), and -d is for dry run.

$ brename -R -e -p '\..+' -d
[INFO] search paths: ./
[INFO] 
[INFO] checking: [ ok ] 'GCF_029211165.1_ASM2921116v1_genomic.fa' -> 'GCF_029211165.fa'
[INFO] 1 path(s) to be renamed

Usually, I kept the versions (.1).

$ brename -R -p '^(\w{3}_\d{9}\.\d+).+' -r '$1.fa' -d
[INFO] search paths: ./
[INFO] 
[INFO] checking: [ ok ] 'GCF_029211165.1_ASM2921116v1_genomic.fa' -> 'GCF_029211165.1.fa'
[INFO] 1 path(s) to be renamed
ADD COMMENT

Login before adding your answer.

Traffic: 2381 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6