Entering edit mode
4.0 years ago
psschlogl
▴
50
I guys I have a directory with a bunch of sub directories with lots of csvs. My csvs have two columns (kmer, counts). For each of the sub directories csvs I want keep the first column (that is share with all files) and merge the second column (counts). Ex:
cut -d , -f 2 sorted_2.csv | paste -d , sorted_1.csv > combo_2.csv
k1,cnt1, cnt2, cnt3...
k2,cnt1, cnt2, cnt3...
k3,cnt1, cnt2, cnt3...
It works fine with the toys test files. I tried to make a script like this:
input="csv_list.txt"
while IFS= read -r line
do
paste -d, combo_files.csv <(cut -d, -f2 $line)
done < "$input"
But got no look yet, because it paste only one column.
What can I improve in this script?
Thanks
I'd recommend using R. You can then
list.files
to generate a list of the CSV files, thenlapply
the functionread.table
along this list to get a list of data frames, and at the end, useReduce(merge, list_of_data_frames)
to get a single data frame.I was trying just to avoid using lots of memory loading all that data in my pc. I can try use python, but I want to do this steps in shell. But I appreciate you time. Thank you very much. paulo
In that case, you can split the list into 3-4 chunks, but I don't think you'll use a lot of memory. If you're still particular about using bash, try
join
instead ofpaste
.I will try it. Thank you