Entering edit mode
8.8 years ago
umn_bist
▴
390
I have a set of 10 TCGA files. 5 are unique tumors tissues and the other 5 are matching normal tissues... So {tumor_A, normal_A, tumor_B, normal_B,... tumor_E, normal_E}
These files each have a random ID, which is the name of its directory. I have kept an Excel spreadsheet tracking which file is a tumor, normal, and which belong together... So
column A (barcode) | column B (filetype) | column C (file ID, directory name)
14124421 TP 1j1iulhkassdalkshdka
12564122 NT 900110d109jd109jdasd
64343343 TP 01920912i409asdaojoj
85546455 NT 901i901i2901i2049i12
46346464 TP 0910912091409109klka
46435435 NT 091i0dkajakakajkjh2a
Thus far, I found awk, sed, grep function using CSV file. How can I build a if then statement?
- if the directory name exists in the 3rd column of CSV file then copy the string in 2nd column variable in the same row.
- if this string is
TP
, store the file within the directory under$TUMOR
and copy the string right below the directory name (ID of normal) and search for the directory and store the file inside this directory under$NORMAL
. - going back to 2nd bullet, if the string was not TP, do nothing and move along
Example
if "1j1iulhkassdalkshdka" exists in column 3 of CSV file
store string of column 2 of the same row
if stored string is TP
store file in ~/foo/bar/1j1iulhkassdalkshdka/ as $TUMOR
store string right below 1j1iulhkassdalkshdka (which is 900110d109jd109jdasd)
in ~/foo/bar/${string} assign file inside to $NORMAL
run my tool using $TUMOR and $NORMAL
erase $TUMOR and $NORMAL links
While you could do this in bash, it'd probably be a bit simpler in python or perl.
The reason I would like to use bash is because our HPC cluster has a SLURM scheduler to perform pipelines via bash script.
I am new to python so I am wondering how similar bash and python scripting is. The one thing I am afraid I'll have to recode is the code in bash script can be used directly in the terminal (aka it's easy).
Would python/perl allow me to script using similarly easy commands? Or is it more like C++ (I am comfortable with C++, I'd just prefer not to use such a heavy duty soln if not required).
Python is a more robust language to work with and is generally preferable for scripts longer than about 10 lines. It's similar to bash in that you don't need to compile it before hand, but it's also a bit closer to C++ in that it supports things like objects and libraries and has a more sane structure.
Anyway, you might also want to look at snakemake. I use it to run stuff with slurm on our cluster and it makes pipelining things relatively painless.