I have a set of files containing the information about logFC, gene names, etc. I want to create count matrix of genes in which I want to compare the gene column in one file with other gene column in another file and print the logFC values and filename according to sample filename. Like
Sample name 1 Sample name 2
Genename log FC value log FC value
How to write a code in bash using awk? Thanks in advance.
You're describing logFC matrix, not count matrix. I think the easiest way would be python/R (or another favorite language). Although awk is a valid programing language it's usually used for short manipulations.
not fold changes to counts but i need to create a matrix where gene name should be there in horizontal and sample name should be in vertical and logFC should be assigned with their respective gene names as well as sample names. So i need to write a script in bash
Thank you but I have used this commands. It cant be used for multiple files like 25 or 30 files. You need something to perform it more efficiently. I have tried it also but not useful for multiple files
You're describing logFC matrix, not count matrix. I think the easiest way would be python/R (or another favorite language). Although awk is a valid programing language it's usually used for short manipulations.
yeah but I am new to python so more comfortable in bash. Can you please give some tips in bash to create matrix?
There is no tip that will allow you to convert fold changes to counts.
There's a reason
pandas
was born. If it was intuitive to represent matrices in bash (or even plain python) there was no need for it.not fold changes to counts but i need to create a matrix where gene name should be there in horizontal and sample name should be in vertical and logFC should be assigned with their respective gene names as well as sample names. So i need to write a script in bash
Thank you but I have used this commands. It cant be used for multiple files like 25 or 30 files. You need something to perform it more efficiently. I have tried it also but not useful for multiple files