Question

changing the name of files

3

Entering edit mode

6.9 years ago

Sam ▴ 150

Dear All

I have about 200 of libs with this naming format ALT1_1_clean.fq.gz but I have to change the name format to be recognized by pipeline. could you guide me about this?

Thanks

     "ALT1_1_clean.fq.gz" change to "ALT_1.R1.fq.gz"
    "ALT1_2_clean.fq.gz"  change to " ALT_1.R2.fq.gz"
    "ALT2_1_clean.fq.gz" change to " ALT_2.R1.fq.gz"
    "ALT2_2_clean.fq.gz" change to " ALT_2.R2.fq.gz"
    .
    .
    .

bash awk • 4.1k views

ADD COMMENT • link updated 6.9 years ago by shenwei356 8.7k • written 6.9 years ago by Sam ▴ 150

score 5 · Answer 1 · 2018-05-14

There are countless ways to accomplish such bash operation, but I always prefer to write simple rules in snakemake.

# mvfq.py
rule:
    input: expand('{samples}_{reads}.fq.gz', samples=['ALT_1', 'ALT_2'], reads=['R1', 'R2'])

rule move_fqs:
    output: mvto = '{sample}_{read}.fq.gz'
    run:
        mvfrom = '_'.join([wildcards.sample.replace('_',''), wildcards.read.replace('R',''), 'clean.fq.gz'])
        shell('mv {mvfrom} {output.mvto}')

I can dryrun it

snakemake -s mvfq.py --dryrun

or run a specific target to make sure everything is working

snakemake -s mvfq.py ALT_1_R1.fq.gz

or run it all on my laptop

snakemake -s mvfq.py

or run it using 4 cores

snakemake -s mvfq.py -j4

or in a cluster via qsub with 100 independent jobs

snakemake -s mvfq.py -j100 -c "qsub"

or using remote files at S3 (or dropbox, google drive, etc) in a cluster

snakemake -s mvfq.py -j100 -c "qsub" --default-remote-provider S3 --default-remote-prefix s3/location/

or I can restart from the last failure check points, and many more.

All without changing the underlying code.

score 4 · Answer 2 · 2018-05-14

4

Entering edit mode

6.9 years ago

Pierre Lindenbaum 165k

ls *_clean.fq.gz | while read F; do mv "$F" $( echo "${F}" | sed 's/_\([12]\)_clean.fq.gz/.R\1.fq.gz/;s/ALT/ALT_/') ; done

ADD COMMENT • link 6.9 years ago by Pierre Lindenbaum 165k

score 4 · Answer 3 · 2018-05-14

4

Entering edit mode

6.9 years ago

igor 13k

The easiest and most readable option (in my opinion):

rename ALT ALT_ *.fq.gz
rename _1_clean .R1 *.fq.gz
rename _2_clean .R2 *.fq.gz

Unfortunately, the rename utility may not be available on all systems.

ADD COMMENT • link 6.9 years ago by igor 13k

score 3 · Answer 4 · 2018-05-14

3

Entering edit mode

6.9 years ago

h.mon 35k

Honestly, change the source code of the pipeline. If this is not possible, here is a one-liner rename (which, as igor noted, may not be available or installed on some systems):

rename 's/(\d)_(\d)_clean.fq.gz/_$1.R$2.fq.gz/' *.gz

Note the single quotes ', is you use double quotes " the capture will not work. As batch-renaming can have catastrophic consequences, I suggest you first perform a fry-run with -n, check if everything is good to go, then proceed with the renaming by not using -n.

ADD COMMENT • link 6.9 years ago by h.mon 35k

1

Entering edit mode

And to make things even more complicated, the rename tool linked by igor in another answer is not the same as the rename tool in this answer, which is available at https://metacpan.org/release/File-Rename, and in the rename package on Debian and related systems.

ADD REPLY • link 6.9 years ago by Charles Plessy ★ 2.9k

0

Entering edit mode

Indeed, good point, which I overlooked. There are renames and renames around, this one is a Perl script, that other one is a binary executable, and in Debian and relatives is called rename.ul.

That is a lot of answers for a "how to rename files" question...

ADD REPLY • link 6.9 years ago by h.mon 35k

0

Entering edit mode

I guess this can be further shortened (code) and extended (function) by:

$ rename -n 's/(\d+)_(\d+)_clean/_$1.R$2/' *.gz

ADD REPLY • link 6.9 years ago by cpad0112 21k

0

Entering edit mode

To further complicate things, I don't think every rename has the -n flag. Mine (from util-linux-ng) does not.

ADD REPLY • link 6.9 years ago by igor 13k

score 2 · Answer 5 · 2018-05-14

2

Entering edit mode

6.9 years ago

cpad0112 21k

Assuming that the files follow same pattern (esp digit_digit pattern)

$  parallel cp {} '{= s:([0-9]+)_([0-9]+)_clean:_$1\.R$2: =}' ::: *.gz

ADD COMMENT • link 6.9 years ago by cpad0112 21k

score 1 · Answer 6 · 2018-05-15

1

Entering edit mode

6.9 years ago

shenwei356 8.7k

---- corrected answer----

Try brename, a practical cross-platform command-line tool for safely batch renaming files/directories via regular expression.

$ brename -p "(\d+)_(\d+)_clean" -r "_\$1.R\$2"
[INFO] checking: [ ok ] 'ALT1_1_clean.fq.gz' -> 'ALT_1.R1.fq.gz'
[INFO] checking: [ ok ] 'ALT1_2_clean.fq.gz' -> 'ALT_1.R2.fq.gz'
[INFO] checking: [ ok ] 'ALT2_1_clean.fq.gz' -> 'ALT_2.R1.fq.gz'
[INFO] checking: [ ok ] 'ALT2_2_clean.fq.gz' -> 'ALT_2.R2.fq.gz'
[INFO] 4 path(s) to be renamed
[INFO] renamed: 'ALT1_1_clean.fq.gz' -> 'ALT_1.R1.fq.gz'
[INFO] renamed: 'ALT1_2_clean.fq.gz' -> 'ALT_1.R2.fq.gz'
[INFO] renamed: 'ALT2_1_clean.fq.gz' -> 'ALT_2.R1.fq.gz'
[INFO] renamed: 'ALT2_2_clean.fq.gz' -> 'ALT_2.R2.fq.gz'
[INFO] 4 path(s) renamed

ADD COMMENT • link 6.9 years ago by shenwei356 8.7k

0

Entering edit mode

That is not quite what OP wanted.

ADD REPLY • link 6.9 years ago by GenoMax 150k

0

Entering edit mode

Sorry for my carelessness, it's fixed.

ADD REPLY • link 6.9 years ago by shenwei356 8.7k

0

Entering edit mode

No worries. Your software is always comprehensive. Nice that you have sanity check built in before the changes are made. I assume software will stop if a test fails?

ADD REPLY • link 6.9 years ago by GenoMax 150k

0

Entering edit mode

Right, it detects potential conflicts (overwriting existed paths and overwriting newly renamed path) and errors (blank target).

ADD REPLY • link 6.9 years ago by shenwei356 8.7k