Please, stop looping over a dataframe

published on 17.10.2020
edited on 10.04.2022

Looping through a dataframe row by row is not something you want to do.

import pandas as pd
t = pd.DataFrame({'a': range(0, 100), 'b': range(0, 100)})

C = []
for i,r in t.iterrows():
    C.append((r['a'], r['b']))

C = []
for ir in t.itertuples():
    C.append((ir[1], ir[2]))

C = []
for r in zip(t['a'], t['b']):
    C.append((r[0], r[1]))

C = []
for r in zip(*t.to_dict("list").values()):
    C.append((r[0], r[1]))

Alternative to .apply() method, yet I haven’t seen a significant benefit.

result = [query_distance(conn, route, first_visit, last_visit)
          for route, first_visit, last_visit in zip(rs.RouteID, rs.FirstVisitTime, rs.LastVisitTime)]

Another example for ‘apply method vs list comprehensions’. This one uses to_records() which generates rec.array which is an awesome thing.

ra_expanded.apply(lambda x: query_collected_bins(conn, x.RouteID, x.StartTime, x.EndTime), axis=1)

records = ra_expanded.loc[:,["RouteID", "StartTime", "EndTime"]].to_records()
[query_collected_bins(conn, int(x.RouteID), x.StartTime, x.EndTime) for x in records]

Wanna LAG? Use shift function.

df['col_diff'] = df['col'] - df['col'].shift(1)

If you need difs you can use .diff() too.

df['col_diff'] = df.col.diff()

Published on 17.10.2020 by Mert Bakır. Last update on 10.04.2022 with commit c00881b.

random

Host Static Content with Basic Authentication on AWS

dev-ops · #dev-ops #static-site

published on 10.07.2022

Previously, I’ve published a blog post about deploying static content on heroku with basic authentication. The main purpose was to get basic auth for a freely hosted static website. In that post, we hosted the source code on GitLab and configured a CI/CD pipeline to render the static content …

About Git Commit Email

random

published on 28.05.2022

Each git commit has a field called Author which consists ‘user.name’ and ‘user.email’. We usually set these variables once, after installing git, with git config --global so that each repo gets the variables from the global definition. We can also set them locally for a …

Host Static Content with Basic Authentication on Heroku using GitLab CI/CD Pipelines

dev-ops · #dev-ops #static-site

published on 25.05.2022

In this post, I’ll first walk through hosting static content with basic authentication. Then, we’ll look into deploying to Heroku using GitLab Pipelines, more specifically deploying a certain sub-directory within the project instead of pushing the whole project. Also, I’ll share …

Bookdown ile Teknik Yazıya Giriş

writing-thesis · #rmarkdown

published on 17.04.2022
edited on 15.07.2022

Önceki bölümde, markdown formatını LaTeX formatına dönüştürmek için kullanılan Pandoc yazılımından bahsetmiştik. Şimdi konuyu bir adım daha ileri taşıyıp ve bookdown’a geçiyoruz. Bookdown; Rmarkdown kullanarak teknik dökümanlar, kitaplar yazabilmemizi sağlayan, Yihui Xie tarafından yazılmış …

My WSL-2 Notes

random · #linux #windows

published on 10.04.2022

I’ve been using WSL-2 on Windows for over a year. It’s very useful because some Python packages are just a headache to install on Windows. Also, docker. It’s just better on Linux. Yet, WSL-2 can also be problematic. I remember trying a dual-boot setup when things just went way too …

Install GeoPandas on Windows

random · #gis #python

published on 03.03.2022

In this post, I’ll share how to install geopandas and some other gis related packages on Windows. If you are on Mac or Linux you can probably just pip install those without any issue. I usually had to do a google search every time I wanted to install these packages on Windows environment. Of …