Exploring My Life with Pandas

Since April 2014, I have used Reporter to collect data about my life—capturing everything from location to sleep quality to how often I am in front of a screen.

A while back, I wrote a post about my data collection process, but haven’t done anything since. And now I have over 3.2k reports spanning 745 days.

Inspired by Nicholas Felton’s talk at ICA Boston this past week, I decided to start wrangling with the data so I could:

  1. Practice using Pandas
  2. And explore what quantified Tom looks like 1

Charts and code follow.

Set Up

After importing my Reporter CSV export into a Pandas DataFrame and cleaning it up, I was left with 20 columns. (Which I later renamed to make them easier to work with.)

Timestamp of Report (GMT) Timestamp of Report (Local Time)
Latitude Longitude
Weather Number of Photos Added
Ambient Audio (dB) Ambient Audio Description
Number of Steps What did you learn today?
Where are you? What are you doing?
Who are you with? How did you sleep?
How many pages did you read tonight? Are you in front of a screen?
What book are you reading? Did you exercise today?
What are you thinking about? Are you wearing your retainer?

They span multiple data types from categorical to numeric, and most everything in between.

Feel the Data

One of my typical first steps is to plot the data to get a feel for it. Are there any obvious trends I can quickly identify?

So that’s what I did.

Did a ton of walking from July to October, traveling around India.

# Steps Per Day
df2 = df[['local_time','num_steps', 'year']].dropna()
table = df2.pivot_table('num_steps', index='local_time', columns='year', aggfunc=np.sum)

Steps Per Day

Also, took a lot of pictures in India. The big spike—the Taj Mahal.

# Photos Taken Per Day
df2 = df[['local_time','num_photos', 'year']].dropna()
table = df2.pivot_table('num_photos', index='local_time', columns='year', aggfunc=np.sum)

Photos Taken Per Day

I read less than I use to.

# Nightly Pages Read
df2 = df[['local_time','num_pages', 'year']].dropna()
table = df2.pivot_table('num_pages', index='local_time', columns='year', aggfunc=np.sum)

Nightly Pages Read

Intrigued by the last graph, I decided to dive deeper into my nightly reading habit.

Page After Page

I try to read every night before I go to sleep. It relaxes me so I fall asleep faster. Why have I read less lately though?

I can probably guess why, but let’s see if the proof is in the data pudding.

First, we zoom in on 2015: df2[df2['year'] == 2015].

Nightly Pages Read (2015)

Now it doesn’t look as bad.

Three hundred sixteen nights of reading with a modest 10.17 pages a night average. There are tons of zeros though. (Looking at you, August and September.)

Let’s zoom in even further, and see what I was reading then.

# Nightly Pages Read (July to October 2015)
df2 = df[['local_time', 'num_pages', 'book']].dropna()
df2 = df2.ix['07-2015':'10-2015']
df2 = df2.pivot(index='local_time', columns='book', values='num_pages')

Nightly Pages Read (July to October 2015)

From July to October, I read zero pages 52 days. And that doesn’t even include the five days in late-September, where I didn’t record any data at all.

I guess The Man Who Smiled and All The Light We Cannot See were boring novels then. False! The data might not capture it, but I enjoyed both of them.

There has to be other factor(s) involved!

Correlations Between Pages Read & Other Things

Using the Pandas df.corr() method, I calculated the Pearson correlation for number of pages read against step count, time of day, ambient audio (in dB), and number of photos taken.

My hypothesis was that pages read was somewhat strongly, inversely correlated with the other factors. Intuitively, it makes sense—the later you go to sleep, more you are walking around outside, etc, the less likely you will read more pages at night.

Perhaps because you are more tired.

However, the closest I got to a strong correlation was pages read versus step count. (Pages read versus photos taken was the second highest.)

_ Pages Read Step Count
Pages Read 1 -0.237353
Step Count -0.237353 1

Unfortuntately, I don’t think there is a satisfying quantitative answer here. Or at least not with the data I have.

Qualitatively though, I think it was because I was in India having the time of my life with my friends.

And that’s good enough for me.


Tweet @tomfme if you want to play around with my data or code.

  1. This is sort of terrifying.