Entering edit mode
3.6 years ago
microorganism_001
▴
30
I have a gene count series matrix I calculated which genes are expressed most with standard deviation calculation but I can not extract only those genes from thousands of extra genes into another csv file.
For reference, one gene has 7 samples I want to extract all highly expressed genes along with its expressed values for different samples.
Dataset is like-
Geneid s1 s2 s3 s4 Standard deviation
TEA001 100 45 86 46 50
TEA000 100 45 86 44 49
TEA001 100 47 86 48 49.1
please help I'm a beginner.
is the question technical (== how would you go about of extracting those genes) or biological (== which are the highly expressed genes) ?
for the technical part have a look at the linux utility
awk
(many info is available online)I have studied AWK command sorry, I can't do this with AWK. Could you see the standard division column I want to filter the series matrix based on this row? how could it be possible?
Unless there's some complex computation involved, you most certainly can
Do you wish to get a subset of rows (based on a column) or a subset of columns (based on a row)?
let's say you want to get all genes from all samples that have SD value greater than 49; (assume your file is a tab delimited)
$6
represents the 6th column.See: https://en.wikipedia.org/wiki/Cat_(Unix)#Useless_use_of_cat
will suffice
Thank you amazing peoples for help me. my problem is now solved with libre office calc.
That's a bad idea. You should be using tools with which you can replicate your analysis. Replication using GUI tools is not easy/straightforward, and automation is near impossible.
Yes, it is working fine. I will follow your suggestion.