Dear all,
(This question is more of data/file management in UNIX/Mac OS X, but the data are fastq files anyway, so I'm asking this question here.)
I have many fastq files scattered in different directories (compressed directories as bz2 or tar.bz2 and non-compressed directories) and I'd like to collect these fastq files in one directory while changing the filenames according to their original directory names. So, my files look a bit like this:
>parentdirectory
->this
-->data
--->sample1
---->target.fastq.bz2
---->anotherfastq.fastq.bz2
---->anotherfile.bz2
--->sample2
---->target.fastq.bz2
---->anotherfastq.fastq.bz2
---->anotherfile.bz2
->that
-->those
--->data.tar.bz2
>anotherdirectory
->data
-->sample_a
--->target.fastq.bz2
--->anotherfastq.fastq.bz2
--->anotherfile.bz2
-->sample_b
--->target.fastq.bz2
--->anotherfastq.fastq.bz2
--->anotherfile.bz2
Please notice that the compressed data.tar.bz2 directory contains the same structure as other non-compressed data directories. My goal is to collect those target.fastq files uncompressed in one directory while changing the filename text "target" into its corresponding parent directory ("sample1", "sample2", "sample_a", etc.).
Any idea how to do that automatically/programmatically?
Thank you in advance for your kind help!
I suspect that Ram's answer will be the simplest in the long term (it'll probably take a bit of playing to get exactly what you want). The alternative is to just code a short little script in bash/python/perl/whatever to walk the directory structure and extract/rename as needed.
Thanks. Could you please point me to the right direction if I want to write a script in Python to do that? Thanks again.
You'll need to import the
os
and likely theglob
modules in python (these should already be available, so there's nothing to install). The steps would then generally be as follows:subprocess
module or, more simply, thesys
module and have the fastq file extracted to the target directory with a new name (I would prepend the sample name, but you can use any naming scheme you like).You could also directly use the tar file in python (I think it's the
tar
module), but that might prove to be a bit more work.For Python, you'll need to use File IO, basic file ops with
os
and some tar operations with thetarfile
module.Sources:
Thank you for the endorsement. I do suspect you meant to say short term, no? Py/Perl scripts are always better for long term :)
Oh, indeed! I need to follow the advice on the coffee mug in my office that translates to, "Don't say anything before the first cup of coffee!".
I'm on my second cup - had an unusually early first cup, guess that helped with the scripting :)