# Basic Statistics

## Describing Data

Just like in Excel, there are some quick ways to pull basic (and even some complex) statistics in Python. Within the `pandas`

library, there is one shortcut command that pulls:

- Total number of observations
- Mean, median, mode
- Standard deviation
- Min and max

To write it out, use the syntax `.describe()`

:

```
# Combined syntax:
df.describe()
```

To obtain the high-level statistics of the entire data set:

```
import pandas as pd
df = pd.read_csv('filename.csv')
df.describe()
```

To obtain the high-level statistics of one column in the data set:

```
import pandas as pd
df = pd.read_csv('filename.csv')
df.column_name.describe()
```

Alternatively, we could pull the descriptive statistics with the following commands:

Descriptive Statistics | Code |
---|---|

Mean | `df_name.column_name.mean()` |

Median | `df_name.column_name.median()` |

Mode | `df_name.column_name.mode()` |

Maximum Value | `df_name.column_name.max()` |

Minimum Value | `df_name.column_name.min()` |

Standard Deviation | `df_name.column_name.std()` |

Count Instances | `df_name.column_name.count()` |

For example, to obtain the max value of one column in the data set:

```
import pandas as pd
df = pd.read_csv('filename.csv')
df.column_name.max()
```

Or, to obtain the counts of one column in the data set:

```
import pandas as pd
df = pd.read_csv('filename.csv')
df_name.column_name.count()
```

## Tutorial

To practice what we just learned, let’s use sample data to pull these statistics. Download the following HR data set from Kaggle: https://www.kaggle.com/rhuebner/human-resources-data-set

### Part One: Describe Command

Take a look at the following code and analyze the output. What observations can you make from this initial pull of the data?

### Part Two: Alternative Commands

What insights can you gather from looking at the data using these commands?