Renaming All Files In A Directory With A Perl Script
6
1
Entering edit mode
10.9 years ago
Robert Sicko ▴ 630

Hoping someone can help with what is a trivial problem. I receive Ion torrent data by subject and the files I get have the barcode name in all of the filenames by default. I want to rename all of the file names with a sampleID instead of this barcode id... I'm new to perl so I'm struggling with a script to rename the bam or vcf files. The start of my script is below, but I feel like I'm complicating things. I basically thought cd into the directory containing all vcfs or bams and run this script with the appropriate flag but I'm stuck at the point below:

#!/usr/bin/perl
# ion_rename.pl

# rename_script - Script to rename a batch of iontorrent files that contain IonXpress barcodes to their study ID
# will rename all vcf or all bam files and their associated index files

use strict; use warnings; use Getopt::Long;

my $VERSION = "1.0"; # it's a good idea to version your programs

my ($vcf,$bam,$help);
my $barcodefile = '';
my $dir = getcwd;

GetOptions( 'vcf'     => \$vcf,
        'bam' => \$bam,
        'barcodefile=s' => \$barcodefile,
        'help' => \$help);

my $usage = "
usage: ion_rename.pl [options] <arguments...>
options:
    --help
    --vcf Use this option for renaming ion vcf files
    --bam Use this option for renaming ion bam files
    --barcodefile <file containing link between barcode and sampleID>
";

@files; 

if ($help) {
    print "version ", $VERSION, "\n";
    print $usage; # it's common to provide a -h to give help
    exit;
}
elsif($barcodefile eq ""){
    print "Must specify barcode file!\n";
    print $usage; # it's common to provide a -h to give help
    exit;
}
else{
    if($vcf){
        @files = glob '*.vcf*';
    elsif($bam){
        @files = glob '*.bam';
    }

    open($barcodefile, "<", "input.txt")    # read from file
        or die "error reading $barcodefile for reading";
    while (<$barcodefile>) {    #read line by line
        chomp;            #remove newline at end
        ($barcodename, $samplename) = split("\t");
    }
}

My bam files are all named like this:

Run_1_011314.TAG_RG_N8FU5.IonXpress_054.bam
Run_1_011314.TAG_RG_N8FU5.IonXpress_082.bam

vcf files are all named like this:

TSVC_variants_IonXpress_082.vcf.gz
TSVC_variants_IonXpress_054.vcf.gz 
TSVC_variants_IonXpress_082.vcf.gz.tbi
TSVC_variants_IonXpress_054.vcf.gz.tbi

I've created a tab-delimited file to map barcode to sample ID like this:

IonXpress_082    A13901
IonXpress_054    A21064

and I want to rename the files like this:

A13901.vcf.gz
A13901.vcf.gz.tbi
A21064.vcf.gz
A21064.vcf.gz.tbi


Run_1_011314.A21064.bam
Run_1_011314.A13901.bam
perl linux • 14k views
ADD COMMENT
2
Entering edit mode

I'd strongly recommend AGAINST renaming files. Instead, create soft links to the original files using the desired names.

ADD REPLY
1
Entering edit mode

I agree, or store the old and new names in a database or text file. You should not rename raw data files; what if you have a query in the future for whoever generated them? They will not know which file you mean.

I often find that people think they need to rename things, for readability, when in fact what they need is more logical code for analysis pipelines.

ADD REPLY
0
Entering edit mode

I keep raw data files "as is" on our long-term storage server. And we keep appropriate databases for tracking data and individual sample IDs. It is somewhat easier to rename the local copies of initial data files for when they enter the analysis pipeline but it is more of a convenience thing at that point.

ADD REPLY
0
Entering edit mode

Is there any reason other than being able to go back to the original data source with the correct name? I would keep a record (my barcode to sampleID file) that I could use to trace a sampleID back to it's IonXpress ID.

ADD REPLY
4
Entering edit mode
10.9 years ago
Christian ★ 3.1k

No need for a custom script. Look up the Linux 'rename' command. It allows you to specify a regular expression for the bulk renaming of files.

ADD COMMENT
1
Entering edit mode

Agree. Have a look at the same question with answers on cross validated

ADD REPLY
0
Entering edit mode

Thanks... If I decide to rename instead of links I can probably hack together a bash script to read the new file names and rename using this bash script.

ADD REPLY
0
Entering edit mode

By the way, on Debian system (and probably Unbuntu too), Perl's rename command is being moved to the rename package. In future releases (Buster, ...) the perl package will not provide the rename command anymore.

ADD REPLY
2
Entering edit mode
10.9 years ago
Ryan D ★ 3.4k

For renaming to substitute one expression for another, I offer the following perl script.

But it just does simple renaming like so:

ryan@WZLINUX7:~/scripts$ perl renamer.pl

Old pattern: foo

New pattern: bar

File foo2 renamed to bar2

File foo3 renamed to bar3

File foo1 renamed to bar1

#!/usr/bin/perl -w

use strict;

my($dir, $oldpat, $newpat);
$dir=".";
print "Old pattern: ";
chomp($oldpat=<STDIN>);
print "New pattern: ";
chomp($newpat=<STDIN>);

opendir(DH, $dir) || die "Can not open $dir: $!";
my @files=readdir DH;
close(DH);

my $oldname;
foreach(@files){
   $oldname=$_;

   s/$oldpat/$newpat/; # change $_ to new pattern

   next if(-e "$dir/$_");
   if(! rename "$dir/$oldname", "$dir/$_"){
      warn "Could not rename $oldname to $_: $!";
   } else {
      print "File $oldname renamed to $_\n";
   }
}
ADD COMMENT
0
Entering edit mode

Thanks. Saved this for future use.

ADD REPLY
1
Entering edit mode
10.9 years ago
DG 7.3k

Instead of putting barcodename and samplename in variables while you are temporarily looping through the barcode map file you should store them as key,value pairs in a hash, with the barcode being the key value.

Then you can loop through your files and match the barcode to the filename and rename accordingly using the value from the hash.

ADD COMMENT
0
Entering edit mode

Thanks, interesting idea. I think this would make the logic of the program easier... I might be able to pull that off in C++ but in Perl it'd take me some doing.

ADD REPLY
0
Entering edit mode

It is even easier to do in Perl than in C++. Declare the hash and in the loop you currently have add:

$hash{$barcodename} = $samplename

After storing that mapping data loop through the file array.It looks easy enough to parse the filename if you know whether it is a bam or vcf as they look to follow regular patterns.

ADD REPLY
0
Entering edit mode

Didn't mean to imply it'd be harder in Perl for someone who know's both languages... I just have far more experience with C++ so Perl is still a struggle for me. With that said, I'm trying to force myself to use Perl as I see the utility of it. Thanks, this will help!

ADD REPLY
0
Entering edit mode

No problem. The syntax for doing this in Perl is pretty straightforward. I tend to do all of these things in Python now myself.

ADD REPLY
0
Entering edit mode

Just out of curiosity, is there a reason you prefer Python now or just personal preference?

ADD REPLY
2
Entering edit mode

Partially personal preference, I find the coding style cleaner. I also have found that in the genomics end of bioinformatics there are far more GOOD tools and libraries for Python compared to Perl. This includes pybedtools, bx.python (ClusterTrees, IntervalTrees), and others. As well the built in tools for handling common file types (delimited file formats, etc) are much better.

ADD REPLY
1
Entering edit mode
10.9 years ago

I did n t test but this could be the way to do it if mapfile is the file with the correspondance.

mapfile=tab-delimited file to map barcode to sample;

for i in $(cut -f1 $mapfile); do
   j=$(grep ^$i $mapfile| cut -f2);
   inifile=$(ls | grep $i);
   echo mv $inifile $j$(echo $inifile | cut -f2-);
done

if you test it it will print all the command, when it s done remove the !echo! at the third line of the loop

ADD COMMENT
0
Entering edit mode

Cleaver... this will help if I go with renaming the files, thanks.

ADD REPLY
1
Entering edit mode
10.9 years ago
Kenosis ★ 1.3k

In case you're interestsed in creating renamed symbolic links to the original files, perhaps the following will be helpful:

use strict;
use warnings;
use File::Basename;

eval { symlink( "", "" ); 1 } or die "Symbolic links not supported.\n";

my ( @cols, %replacements );

while (<>) {
    $replacements{ $cols[0] } = $cols[1] if @cols = split;
}

for my $oldFile ( grep /bam$|vcf\.gz/, <./originals/*> ) {
    my $newFile = basename $oldFile;

    if ( my ( $tag, $ion ) = $oldFile =~ /\/Run.+(TAG.+)(IonXpress_\d+)/ ) {
        $newFile =~ s/$tag$ion/$replacements{$ion}/ if $replacements{$ion};
    }

    if ( my ($ion) = $oldFile =~ /\/TSVC_variants_(IonXpress_\d+)/ ) {
        $newFile =~ s/.+$ion/$replacements{$ion}/ if $replacements{$ion};
    }

    symlink $oldFile, $newFile
      or warn "Unable to create a symbolic link for '$oldFile': $!";
}

Usage: perl script.pl mapFile

Just place both script.pl and mapFile in a directory where there's a subdirectory called originals that contains all of your original files. The script will create symbolic links--named as you've specified--to those original files.

ADD COMMENT
0
Entering edit mode

Thanks... If I go the link instead of the rename route, this will help!

ADD REPLY
0
Entering edit mode
7.3 years ago

This code can repair but you probably have problems when you try to rename some other type of files, better way is using software Batch Rename Files Tool. You can easily found hier BatchRenameFiles.org.

ADD COMMENT

Login before adding your answer.

Traffic: 1950 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6