Basic Statistics

Jun 9, 2021

Describing Data

Just like in Excel, there are some quick ways to pull basic (and even some complex) statistics in Python. Within the pandas library, there is one shortcut command that pulls:

Total number of observations
Mean, median, mode
Standard deviation
Min and max

To write it out, use the syntax .describe():

# Combined syntax:
df.describe()

To obtain the high-level statistics of the entire data set:

import pandas as pd
df = pd.read_csv('filename.csv')
df.describe()

To obtain the high-level statistics of one column in the data set:

import pandas as pd
df = pd.read_csv('filename.csv')
df.column_name.describe()

Alternatively, we could pull the descriptive statistics with the following commands:

Descriptive Statistics	Code
Mean	`df_name.column_name.mean()`
Median	`df_name.column_name.median()`
Mode	`df_name.column_name.mode()`
Maximum Value	`df_name.column_name.max()`
Minimum Value	`df_name.column_name.min()`
Standard Deviation	`df_name.column_name.std()`
Count Instances	`df_name.column_name.count()`

For example, to obtain the max value of one column in the data set:

import pandas as pd
df = pd.read_csv('filename.csv')
df.column_name.max()

Or, to obtain the counts of one column in the data set:

import pandas as pd
df = pd.read_csv('filename.csv')
df_name.column_name.count()

Tutorial

To practice what we just learned, let’s use sample data to pull these statistics. Download the following HR data set from Kaggle: https://www.kaggle.com/rhuebner/human-resources-data-set

Part One: Describe Command

Take a look at the following code and analyze the output. What observations can you make from this initial pull of the data?

Part Two: Alternative Commands

What insights can you gather from looking at the data using these commands?

Basic Statistics

Describing Data

Tutorial

Part One: Describe Command

Part Two: Alternative Commands

Mariah Norell

Data Scientist & Lecturer