In this article, we'll take a look at Hide

Mean, Median, Mode, and Range are basic statistical tools used to describe the central tendency of a dataset—that is, where most of the values are centered. In AI Engineering, these measures help us understand how data is spread out and can also help detect outliers (unusual values that don’t fit the pattern). In the following sections, we’ll explore what each measure means and how to calculate it using Python.

Mean

The mean is the average of a set of numbers. To find it, you simply add up all the values in the dataset and then divide the total by the number of values. The mean gives us a quick idea of the general size of the numbers in the data. However, it can be affected by very high or very low values (called outliers), which might pull the mean away from where most of the data lies.

Median

The median is the middle value in a dataset. To find it, you first arrange all the numbers in order from smallest to largest. If the number of values is odd, the median is the one right in the middle. If the number is even, the median is the average of the two middle values.

The median is a helpful way to understand the center of the data, especially when there are outliers (very large or very small values). Unlike the mean, the median is not affected by outliers, which makes it a more reliable measure in certain cases. In Python, you can easily calculate the median using the median() function from the NumPy library.

Mode

The mode is the value that appears most often in a dataset. To find it, you simply look for the number that occurs the most. If two values appear the same number of times, the dataset is called bimodal. If three values repeat equally, it’s trimodal, and if there are more, it’s called multimodal.

The mode helps us understand which value is the most common, which can be useful in some situations. However, it may not be very helpful if the data has no repeating values or if the values are very spread out. In Python, we can calculate the mode using the SciPy library, which has a function called mode().

Range

The range is the difference between the highest and lowest values in the data set (the largest number minus the smallest number).

To calculate range math, simply determine the largest and smallest values and then find the difference by subtracting (rearranging the numbers in ascending order at the very start of this example makes calculating the range very easy). In this example, the largest number in the data set is 8, and the smallest number in the data set is 1.

 

Leave a Comment