I have 20 files where the filenames are like: ERR260136.genefamilies.csv
, ERR276187.genefamilies.csv
, etc. Each file has to be multiplied by one constant. The corresponding constant has to be taken from a .csv
file named read_count.csv
. The read_count.csv
file is like:
SampleID Read_counts
ERR260136 25636740
ERR260140 19166076
ERR260145 28011856
ERR260147 27916650
ERR260148 21871928
ERR260150 30130062
ERR260152 17949808
So, ERR260136.genefamilies.csv has to be multiplied by 25636740, ERR260140.genefamilies.csv has to be multiplied by 19166076 and so on...
The 20 files to be multiplied are in this format:
#Gene Family ERR260136_Abundance-RPKs
UNMAPPED 0.445035
UniRef90_A0A015P9C8 0.00080211
UniRef90_A0A015P9C8|g__Bacteroides.s__Bacteroides_fragilis 0.00080211
UniRef90_A5ZYU5 0.000787149
How can I do this using bash comand? Can anyone please help?
Hello, your thread looks super close to what can be asked in an assignment. Could you show us what you have tried so far so we can use it as a start ? Thanks !
You can read your first file with awk, then for every line (sampleID/Read_counts) you pipe it into another awk command opening the according file (sampleID) multiplying your column with the constant (Read_counts)
It is not an assignment. I need this operation while analyzing genomic data. Being new in programing I can only find out read counts for a sample from the first file (
read_count.csv
) by this:Can you please show me some way? I am stuck.
Is the indentation of your example of 1 of the 20 files correct, or is it supposed to be a fairly simple tabular format? You're extension implies csv, but it doesn't appear to be.
I am also going to strongly advise you not to do this with
bash
as you requested, as it does not support floating point arithmetic without a lot of complicated fudging. See here for an example of why A: assigning the values in matrix in bashYes. It is supposed to be a .csv file only. Actually, I modified so that it can look a table here in the post. It should be like:
That is still not a csv, you need to be more clear about what these files exactly are.