edited on 10.04.2022
Looping through a dataframe row by row is not something you want to do.
import pandas as pd
t = pd.DataFrame({'a': range(0, 100), 'b': range(0, 100)})
C = []
for i,r in t.iterrows():
C.append((r['a'], r['b']))
C = []
for ir in t.itertuples():
C.append((ir[1], ir[2]))
C = []
for r in zip(t['a'], t['b']):
C.append((r[0], r[1]))
C = []
for r in zip(*t.to_dict("list").values()):
C.append((r[0], r[1]))
Alternative to .apply()
method, yet I haven’t seen a significant benefit.
result = [query_distance(conn, route, first_visit, last_visit)
for route, first_visit, last_visit in zip(rs.RouteID, rs.FirstVisitTime, rs.LastVisitTime)]
Another example for ‘apply method vs list comprehensions’. This one uses to_records()
which generates rec.array
which is an awesome thing.
ra_expanded.apply(lambda x: query_collected_bins(conn, x.RouteID, x.StartTime, x.EndTime), axis=1)
records = ra_expanded.loc[:,["RouteID", "StartTime", "EndTime"]].to_records()
[query_collected_bins(conn, int(x.RouteID), x.StartTime, x.EndTime) for x in records]
Wanna LAG? Use shift function.
df['col_diff'] = df['col'] - df['col'].shift(1)
If you need difs you can use .diff() too.
df['col_diff'] = df.col.diff()