Unlocking Insights: A Beginner's Guide to Statistics with Python

Chapter 1: Understanding Tabular Data

In our data-driven world, we frequently encounter tabular data, such as that found in Excel spreadsheets. By grasping fundamental statistical concepts, we can extract valuable insights from these datasets.

Statistics represents a branch of mathematics focused on the collection, analysis, interpretation, and presentation of numerical data. This introductory series will explore how basic statistical techniques can be applied to datasets to enhance our understanding.

Section 1.1: Key Statistical Measures

Arithmetic Mean

Definition: The arithmetic mean is the total sum of all values divided by the count of observations.

Uses: This metric is essential when a straightforward average is needed. It can also illustrate the total impact of various observations relative to their number.

Limitations: The mean is highly sensitive to outliers. For instance, the mean of 5, 10, and 60 is calculated as (5 + 10 + 60) / 3 = 25. Due to this susceptibility, the median is often favored in datasets that may contain extreme values, such as average salaries.

To compute the arithmetic mean in Python, we can use either the statistics or numpy library:

import statistics as st

data = [5, 10, 15, 20, 25]

x = st.mean(data)

print(x)

Or alternatively:

import numpy as np

data = [5, 10, 15, 20, 25]

x = np.mean(data)

print(x)

Median

Definition: The median is the middle value in an ordered dataset. For datasets with an even number of entries, the median is calculated as the average of the two middle values.

Example: For the dataset [1, 2, 3, 4] (where n = 4):

Median = ((4/2) + ((4+2)/2))/2 = (2 + 3)/2 = 2.5

For an odd-numbered dataset like [5, 6, 7, 8, 100]:

Median = (5 + 1)/2 = 3 (the third ordered value, which is 7).

Uses: The median is particularly useful for ordered datasets that are not influenced by extreme values.

To find the median in Python:

import statistics as st

data = [5, 6, 9, 20] # Even dataset

x = st.median(data)

print(x)

Or for an odd-numbered dataset:

import numpy as np

data = [5, 6, 7, 8, 100]

x = np.median(data)

print(x)

Mode

Definition: The mode is the value that appears most frequently within a dataset. A dataset may have multiple modes.

Uses: This measure helps identify the most common values or categories within a dataset and is unaffected by outliers.

To calculate the mode in Python:

import statistics as st

data = [1, 2, 3, 4, 5, 6, 7, 8, 8, 8, 9, 10]

x = st.mode(data)

print(x)

Weighted Mean

Definition: Unlike the arithmetic mean, the weighted mean assigns different weights to each value in the dataset.

Uses: This allows for a mean calculation that reflects the importance of each observation based on its weight. Weights must total 100%.

Example: In an investment portfolio with various returns and weights, the weighted mean can be computed as follows:

Weighted mean = (0.1 * 10) + (0.15 * 15) + (0.2 * 20) + (0.25 * 25) + (0.3 * 30) = 22.5%.

In Python, we can create a DataFrame for this:

import pandas as pd

import numpy as np

df = pd.DataFrame(np.array([[1, 0.10, 10], [2, 0.15, 15], [3, 0.20, 20], [4, 0.25, 25], [5, 0.30, 30]]),

columns=['Asset', 'Return %', 'Weight %'])

And then calculate the weighted mean:

for index, row in df.iterrows():

x = sum(df['Return %'] * df['Weight %'])

print(x)

# Output: 22.5

Geometric Mean

Definition: This calculates the average value over a set of numbers.

Uses: Frequently used for sets of values intended for multiplication or exponential growth, such as compound interest rates.

For instance, to find the geometric mean for Asset 1:

import statistics as st

asset_1_growth_rate = [0.05, 0.02, -0.06]

initial = 1

asset_1_growth_rate[:] = [x + initial for x in asset_1_growth_rate]

asset_1_gm = st.geometric_mean(asset_1_growth_rate)

print(((asset_1_gm) - 1) * 100)

# Output: 0.22%

A lower geometric mean may indicate greater variability or inconsistency compared to oth

jamelkenya.com

Unlocking Insights: A Beginner's Guide to Statistics with Python

Chapter 1: Understanding Tabular Data

Section 1.1: Key Statistical Measures

Arithmetic Mean

Median

Mode

Weighted Mean

Geometric Mean

Share the page:

Recent Post:

Mindfulness Made Easy: Practical Tips to Start Your Practice

# Essential Skills Every Writer Must Master for Success

The Essential Guide to Gut Health: Foods to Embrace and Avoid