How can I extract mostly expressed genes from a series matrix?

0

Entering edit mode

4.4 years ago

microorganism_001 ▴ 30

I have a gene count series matrix I calculated which genes are expressed most with standard deviation calculation but I can not extract only those genes from thousands of extra genes into another csv file.

For reference, one gene has 7 samples I want to extract all highly expressed genes along with its expressed values for different samples.

Dataset is like-

Geneid   s1 s2 s3 s4 Standard deviation
TEA001    100         45         86           46          50
TEA000    100         45         86           44          49
TEA001    100         47         86           48           49.1

please help I'm a beginner.

WGCNA R RNA-seq • 1.9k views

ADD COMMENT • link 4.4 years ago by microorganism_001 ▴ 30

0

Entering edit mode

is the question technical (== how would you go about of extracting those genes) or biological (== which are the highly expressed genes) ?

for the technical part have a look at the linux utility awk (many info is available online)

ADD REPLY • link 4.4 years ago by lieven.sterck 16k

0

Entering edit mode

I have studied AWK command sorry, I can't do this with AWK. Could you see the standard division column I want to filter the series matrix based on this row? how could it be possible?

ADD REPLY • link 4.4 years ago by microorganism_001 ▴ 30

0

Entering edit mode

I can't do this with awk

Unless there's some complex computation involved, you most certainly can

Could you see the standard division column I want to filter the series matrix based on this row

Do you wish to get a subset of rows (based on a column) or a subset of columns (based on a row)?

ADD REPLY • link 4.4 years ago by Ram 45k

0

Entering edit mode

let's say you want to get all genes from all samples that have SD value greater than 49; (assume your file is a tab delimited)

cat yourmatrixfile | awk '{if($6>49) print}' > SD.greater49.txt

$6 represents the 6th column.

ADD REPLY • link 4.4 years ago by Mehmet ▴ 820

2

Entering edit mode

See: https://en.wikipedia.org/wiki/Cat_(Unix)#Useless_use_of_cat

awk '{if($6>49) print}' yourmatrixfile > SD.greater49.txt

will suffice

ADD REPLY • link 4.4 years ago by Ram 45k

1

Entering edit mode

awk '$6>49' yourmatrixfile

ADD REPLY • link 4.4 years ago by cpad0112 21k

0

Entering edit mode

Thank you amazing peoples for help me. my problem is now solved with libre office calc.

ADD REPLY • link 4.4 years ago by microorganism_001 ▴ 30

2

Entering edit mode

That's a bad idea. You should be using tools with which you can replicate your analysis. Replication using GUI tools is not easy/straightforward, and automation is near impossible.