Bike Pattern 2

We used a little bit of machine learning on Divvy Data to dig into a better division of Chicago. We try to identify patterns among bike stations.

The data

Divvy Data publishes a sample of the data.

We know the stations.

Normalize, aggregating and merging per start and stop time

We need vectors of equal size which means filling NaN values with 0 and adding times when not present.

Clustering (stop and start)

We cluster these distribution to find some patterns. But we need vectors of equal size which should be equal to 24*6.

This is much better.

Let's build the features.

Let's see what it means accross day. We need to look whether or not a cluster is related to day of the working week or the week end.

Let's draw the clusters.

Four patterns emerge. Small clusters are annoying but let's show them on a map. The widest one is the one for the week-end.

Graph

We first need to get 7 clusters for each stations, one per day.

Let's see which station is classified in more than 4 clusters. NaN means no bikes stopped at this stations. They are mostly unused stations.

Let's draw a map on a week day.