count matrix of genes
1
0
Entering edit mode
4.4 years ago

I have a set of files containing the information about logFC, gene names, etc. I want to create count matrix of genes in which I want to compare the gene column in one file with other gene column in another file and print the logFC values and filename according to sample filename. Like Sample name 1 Sample name 2 Genename log FC value log FC value

How to write a code in bash using awk? Thanks in advance.

microarray • 1.0k views
ADD COMMENT
0
Entering edit mode

You're describing logFC matrix, not count matrix. I think the easiest way would be python/R (or another favorite language). Although awk is a valid programing language it's usually used for short manipulations.

ADD REPLY
0
Entering edit mode

yeah but I am new to python so more comfortable in bash. Can you please give some tips in bash to create matrix?

ADD REPLY
0
Entering edit mode

There is no tip that will allow you to convert fold changes to counts.

ADD REPLY
0
Entering edit mode

There's a reason pandas was born. If it was intuitive to represent matrices in bash (or even plain python) there was no need for it.

ADD REPLY
0
Entering edit mode

not fold changes to counts but i need to create a matrix where gene name should be there in horizontal and sample name should be in vertical and logFC should be assigned with their respective gene names as well as sample names. So i need to write a script in bash

ADD REPLY
0
Entering edit mode

Thank you but I have used this commands. It cant be used for multiple files like 25 or 30 files. You need something to perform it more efficiently. I have tried it also but not useful for multiple files

ADD REPLY
0
Entering edit mode
4.4 years ago
Shalu Jhanwar ▴ 540

You can generate logFC matrix from different files using "paste" and "cut" commands. E.g. if the files are two-column tab-delimited format like below:

File1

g1 0.4

g2 0.6

g3 0.9

File2

g1 2.4

g2 3.0

g3 5.0

The command will generate below file:

paste File1 File2 | cut -f1,2,4 > File3

cat File3

g1 0.4 2.4

g2 0.6 3.0

g3 0.9 5.0

After generating the files, you can insert the header (sample names) using 'sed'

sed -i 1i"geneName\tFile1\tFile2" File3

You can perform these operations on multiple files.

ADD COMMENT

Login before adding your answer.

Traffic: 2939 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6