Quantified Self: Clustering data from my Habit Tracker

Carlos Abiera
4 min readAug 13, 2020

I’ll be contributing to the Quantified Self movement by sharing the result of this clustering project.

The data exported from Loop — Habit Tracker. This is my friend for several months in my 8 to 5 job before. It helped me record the progress of my routine (what gets measured, gets done).

A habit is an action we do subconsciously or when there is no mental effort to it, otherwise it is a routine. These are the routine that I was trying to develop:

Morning Meditation, Morning readings, Midday Sleep, Out on time, Arthour (personal projects), Before sleep readings, Before sleep meditation.

Using the exported data from the app, I have different column and rows. Columns are the habit that I want to track, and rows are the binary values that show whether the routine was fulfilled or not (1-yes , 0-no) .

Raw Data
Bar Plot of fulfilled routine
Data Normalization

Pearson Correlation

Using correlation analysis, I discovered the strengths of relationships among my routines. This is worth exploring to establish a connection between these habits.

Interpreting Correlation Values

From the interpretation above we can see that:

  • Out on time routine (+0.63) has a moderate to strong positive linear relationship to Before sleep reading routine
  • Midday Sleep and Morning Readings routine (+0.73) have a strong positive linear relationship to each other.
  • Arthour and Before Sleep routine (+0.77) have a strong positive linear relationship to each other.
  • Before sleep meditation and Morning Meditation routine (+0.62) have a strong positive linear relationship to each other.

CLUSTERING

Clustering is an unsupervised learning algorithm that finds clusters or group in a set of data. These groups should have similar properties or features. This method is a common technique for statistical data analysis used in many fields.

K-modes Clustering. According to the documentation

K-modes is used for clustering categorical variables. It defines clusters based on the number of matching categories between data points.This is in contrast to the more well-known k-means algorithm, which clusters numerical data based on Euclidean distance.)

For simplicity, I used 3 number of clusters, this is a subjective judgement about the number of really distinctive clusters described here (I tried more K but it hurts my data).

These are different clusters derived using K-modes. I arbitrarily named my routines as: The Idealist, The Supervisor, The Visionary.

CLUSTER 1: The Idealist

This group has no Out on time and has a plenty of Midday Sleep routine. I remember I stay late at work to read or finish something that needed the next day.

CLUSTER 2: The Supervisor

This cluster has the least number of routines. These days could be tiring.

CLUSTER 3: The Visionary

This cluster has the most number of accomplished routines.

Analyzing such routines helps me understand better how my day looks like back then. From this data, I can design more meaningful routine towards a better day.

Originally published at https://www.linkedin.com.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Carlos Abiera
Carlos Abiera

Written by Carlos Abiera

Carlos C. Abiera currently manages the operations of Montani Int. Inc. and leads the REV365 data team. He has keen interests in data and behavioral sciences.

No responses yet

Write a response