Trouble using if string in column 3 of .csv exist, then statement
0
0
Entering edit mode
8.8 years ago
umn_bist ▴ 390

I have a set of 10 TCGA files. 5 are unique tumors tissues and the other 5 are matching normal tissues... So {tumor_A, normal_A, tumor_B, normal_B,... tumor_E, normal_E}

These files each have a random ID, which is the name of its directory. I have kept an Excel spreadsheet tracking which file is a tumor, normal, and which belong together... So

column A (barcode) | column B (filetype) | column C (file ID, directory name)

    14124421                  TP             1j1iulhkassdalkshdka
    12564122                  NT             900110d109jd109jdasd

    64343343                  TP             01920912i409asdaojoj
    85546455                  NT             901i901i2901i2049i12

    46346464                  TP             0910912091409109klka
    46435435                  NT             091i0dkajakakajkjh2a

Thus far, I found awk, sed, grep function using CSV file. How can I build a if then statement?

  • if the directory name exists in the 3rd column of CSV file then copy the string in 2nd column variable in the same row.
  • if this string is TP, store the file within the directory under $TUMOR and copy the string right below the directory name (ID of normal) and search for the directory and store the file inside this directory under $NORMAL.
  • going back to 2nd bullet, if the string was not TP, do nothing and move along

Example

if "1j1iulhkassdalkshdka" exists in column 3 of CSV file
     store string of column 2 of the same row
     if stored string is TP
          store file in ~/foo/bar/1j1iulhkassdalkshdka/ as $TUMOR
          store string right below 1j1iulhkassdalkshdka (which is 900110d109jd109jdasd)
          in ~/foo/bar/${string} assign file inside to $NORMAL
          run my tool using $TUMOR and $NORMAL
          erase $TUMOR and $NORMAL links
TCGA bash • 1.7k views
ADD COMMENT
0
Entering edit mode

While you could do this in bash, it'd probably be a bit simpler in python or perl.

ADD REPLY
0
Entering edit mode

The reason I would like to use bash is because our HPC cluster has a SLURM scheduler to perform pipelines via bash script.

I am new to python so I am wondering how similar bash and python scripting is. The one thing I am afraid I'll have to recode is the code in bash script can be used directly in the terminal (aka it's easy).

Would python/perl allow me to script using similarly easy commands? Or is it more like C++ (I am comfortable with C++, I'd just prefer not to use such a heavy duty soln if not required).

ADD REPLY
0
Entering edit mode

Python is a more robust language to work with and is generally preferable for scripts longer than about 10 lines. It's similar to bash in that you don't need to compile it before hand, but it's also a bit closer to C++ in that it supports things like objects and libraries and has a more sane structure.

Anyway, you might also want to look at snakemake. I use it to run stuff with slurm on our cluster and it makes pipelining things relatively painless.

ADD REPLY

Login before adding your answer.

Traffic: 1544 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6