I have a generator containing 5400 Sqlalchemy rows of 1 table(Entity). I would like to group rows by the value of one column (let’s say column 1); for each “group” I want to calculate the mean of the rows based on another column(let’s say column 2).
Something like this:
Step 1: {col1_value_1: [row1.col_2_value, …..rown.col_2_value], col1_value_2:[rowj.col_2_value, rowm.col_2_value, …….rowk.col_2_value], ……col1_value_p:[rowq.col_2_value, …..rowt.col_2_value]}
Step2: {col1_value_1: arreay_mean, col1_value2:array_mean, …..}
I tried something like this:
def group_by_col_1_val(rows):
map_ = {}
for row in rows:
key = row.col_1
value = row.col_2
try:
map_[key] = map_[key].append(value)
except KeyError:
map_[key] = [value]
return map_
I am quite sure there is a better way of doing it.
I found this question Python group by a object key gives errros but I did not understand how to use itertools in my case
Enhance the code by importing defaultdict from the collections module for a more streamlined approach. Initialize map_ as a defaultdict with a list as the default value, eliminating the necessity for try-except blocks when adding to the dictionary. To maintain consistency, return the result as a regular dictionary.
from collections import defaultdict
def group_by_col_1_val(rows):
map_ = defaultdict(list)
for row in rows:
map_[row.col_1].append(row.col_2)
return dict(map_)
Hmm…, as you speak of SQLAlchemy, I assume that this data is contained inside a database table. I am far from a SQLAlchemy expert, but I know that grouping by a column and taking the average of each group is built into the SQL query language. With little effort, you should be able to get the expected result directly from the database…