Summary

  • Python and statistical analysis can be used to explore the linear relationship between screen-on time and battery life on mobile devices.
  • Seaborn, pandas, and Python can be used for plotting regressions, while Pingouin will construct the model.
  • Battery usage was linear for both phone and tablet, demonstrating the practical value of statistical analysis in Python.

Recently, I wanted to find out how phone and tablet screen time was affecting battery drain. You might not think to go to Python for that, but its basic statistics tools can actually provide the answer.

Maybe I’m one of those people who seems to agonize over battery life, though I don’t think I actually need to. I always have more than enough at the end of the day with both my Android phone and tablet and working from home, I’m never very far away from an outlet.

It seems obvious that having the screen-on might use the battery more, but I was curious about how the two were actually related. Could it be a linear relationship? Could it be exponential, where the battery use accelerates quickly?

I’ve been interested in data analysis recently. I’d taken a course in basic statistics and probability back in college, and I’d first encountered the concept of linear regressions, where data points are plotted along two axes and a line is drawn through them, in a physics lab.

When I noticed that both my phone and tablet kept statistics on screen-on time and battery usage, I remembered my earlier experience. The TI graphing calculator I used back in college is—somewhere in my house, but I don’t need a calculator anymore since I have Python. I decided to use Python’s array of statistical computing tools to figure out what relationship, if any, that screen-on time and battery life had.

Formulating a Hypothesis

One part of statistics is hypothesis testing. This is the key to good science. This might have been overkill for a small project like this, but I wanted to be a good statistician and scientist, even if only for a little while.

In statistics, you form a null hypothesis and an alternative hypothesis, also known as “H0” and “H1.”

The null hypothesis is supposed to be something you can reject and choose the alternative hypothesis if the evidence in the form of a statistical test, warrants rejecting it.

In the case of screen-on time and battery life. In regards to battery life for my phone and tablet:

Null hypothesis (H0)

There is no relationship between screen-on time and battery life.

Alternative Hypothesis (H1)

There is a relationship between screen-on time and battery life.

With the null and alternative hypotheses defined, I could proceed with collecting and analyzing my data.

Getting My Data Into Python

The first step was to organize my data. The modified Python OS on my Samsung phone and tablet keeps tabs on battery usage and screen-on time over the past week.

Phone and tablet battery stats organized in a LibreOffice Calc spreadsheet.

I dutifully flipped open my tablet and phone to their diagnostic pages and copied the data into a LibreOffice Calc spreadsheet, then I saved it as a CSV file.

I created columns for the reported screen-on times for my phone and tablet, with two columns each for both devices. I figured the easiest way to code the screen-on time was to multiply the hours by 60 and add the minutes to that, giving the total number of minutes the phone’s screen was on. (I’d originally coded them as hours and minutes in the spreadsheet, but pandas didn’t seem to like the format, so I eventually recoded it.)

I used Python’s pandas library to read in the data. pandas can read many data formats, including CSV files. pandas makes it easy to work with data arranged in rows and columns, which makes it easy to make the jump from spreadsheets to Python.

I opened up a Jupyter notebook and ran some Python commands to set up the environment:


import numpy as np
import pandas as pd
import seaborn as sns
import pingouin as pg
%matplotlib inline

Related

How to Get Started Creating Interactive Notebooks in Jupyter

Freely mix text and code in your programs in a new style of programming.

These commands set up the libraries I was going to use. NumPy is handy for all sorts of numerical operations, including statistics. It’s kind of like a statistical safety blanket, so I always import it whenever I’m working with data. The second statement obviously imports pandas, but abbreviated as “pd” just so I don’t have to keep typing out “pandas.” Seaborn is a fantastic tool for making statistical plots, and that’s what I used to make my scatter plots and regressions. Pingouin (French for “penguin”) is a library that does statistical tests, and that’s what I used to build my models and see how good of a fit they were to my data.

The last line in that block is to make any plots appear inline in the Jupyter notebook. Otherwise, they will appear in a separate window.

With all of the necessary libraries loaded, I could create a pandas dataframe with my data:


battery = pd.read_csv("data/device_batteries.csv")

I could inspect my new dataframe with the head function:


battery.head()

The output of the battery.head() command using pandas and Python.

This shows the first few lines of the data and lets me see how it’s organized. Of course, I already knew that since I created it. This command is helpful for any datasets I download from places like Kaggle.

The next step was to get some descriptive statistics from my dataset. The describe method of my dataframe does just that:


batteries.describe()

Descriptive statistics of the battery data in Pythonm using the pandas describe() function.

This prints basic descriptive statistics like the count, the mean, the standard deviation, the minimum and maximum values, as well as the lower quartile or 25th percentile, the median (50th percentile), and the upper quartile or 75th percentile for all numerical columns in the dataframe. All of these stats help me get a sense of the lay of the land.

Plotting the Regression

Now with my data entered and imported, it was time to explore the relationships.

I started by making a scatter plot of my phone’s screen-on time vs. the battery drain in percentage points:


sns.relplot(x='phone_screen_on',y='phone_battery',data=battery)

phone vs battery scatter plot in Seaborn.

This tells Seaborn to make the plot using the screen-on time as the x-axis and the battery drain as the y-axis. If you look closely, the data points seem to align in nearly a straight line. I decided to plot the regression and see how well a line would fit:


sns.regplot(x='phone_screen_on',y='phone_battery',data=battery)>

Phone screen-on time vs battery drain regression plot in Seaborn.

The line fit pretty well.

But how would I verify the fit and how could I reconstruct the equation used to produce the line? That’s where Pingouin comes in. Several other libraries let you do statistical tests and make regressions but Pingouin is my favorite because I find it the easiest to use.

I used Pingouin’s linear regression function on screen-on time vs. battery drain:


pg.linear_regression(battery['phone_screen_on'],battery['phone_battery'])

Battery regression results from Pingouin.

Pingouin produces a table. Here’s what it means. The really important numbers are on the very left-hand side. If you remember your algebra, the equation of a straight line is y = mx + b. With linear regression, we flip this around a bit to make a “normal equation” of y = b + mx, or rather y = a + bx. The y-intercept, or where the line crosses the y-axis, is the a, labeled “intercept” is 5.339232, and the coefficient of x or the screen-on time, is 0.201630, which determines the slope or steepness of the line. So the equation of our model is y = 5.339232 + 0.201630x.

The other numbers tell us how good of a fit this line is. “se” stands for “standard error” and measures how far the line is from the data points. The lower the standard error, the better the fit. For the screen-on time, the value is around .20, which means the line is a good fit for the data points. The T score is the t-statistic of Student’s t-test measures tests the hypothesis that the correlation coefficient, mentioned below, is 0, meaning no correlation. A value of over 2 or less than -2 means that a result is statistically significant. The p-value is the probability that the sample statistic, in this case the t-value, would be as extreme or more extreme if the null hypothesis was true. Most statisticians use a certain threshold, usually with a maximum of .05, to accept or reject the null hypothesis. Because the t-statistic is greater than 2 and the p-value is less than .05, we can reject the null hypothesis that there’s no relationship between screen-on time and battery drain at both the .05 and .01 signifigance levels.

The “r2” and “r2_adusted” are the squares of the correlation coefficient. A high value here also means the line is a good fit. The last couple of columns are confidence intervals, which represent areas where the values of the equation could land at 2.7% and 97.5% confidence levels. The shaded areas in the regression plots also represent confidence intervals.

Now that we have a model, we can plug values into the screen-on time variable to predict how much the battery will drain.

I can define a Python function for this:


def phone_battery_usage(minutes):
    return 5.339232 + 0.201630 * minutes

To calculate usage for three hours or 180 minutes:


phone_battery_usage(180)

Let’s do the same thing for my tablet. First the scatter plot:


sns.relplot(x='tablet_screen_on',y='tablet_battery',data=battery)

Tablet screen-on time vs battery drain scatter plot

Again, there seems to be a linear correlation. Let’s plot the regression:


sns.regplot(x='tablet_screen_on',y='tablet_battery',data=battery)

Seaborn tablet screen-on time vs battery drain.

Another good fit. Let’s try a Pingouin linear regression:


pg.linear_regression(battery['tablet_screen_on'],battery['tablet_battery'])

Pingouin regression table of tablet screen-on time vs. tablet battery drain.

The model is battery usage = 5.017013 + 0.112511(screen-on time). It’s a good fit, with T = 6.202436, p = 0.001591, and r² = 0.884979.

What I Learned From My Little Science Project

One thing that surprised me was how linear relationships held up in the real world. Battery usage was linear for both my phone and tablet. This might line up with some research I did on battery drain of lithium-ion batteries, where they drain faster when you start using them and toward the end of the charge, but the discharge curve remains linear in between, according to Ufine Battery. This project proves the value of statistical analysis.

It offers a more rigorous way of answering questions, but with modern software like Python, Seaborn, and Pingouin, it’s easier than ever for researchers and ordinary people like me to explore.