What is Central Tendency and why do we use it? Let’s say you made 1400 sales of ice creams in one year, however, you being smart, you sold the ice cream at different prices depending on the season. Now at the end of the year, you want to calculate how much you earned on an average so you can make various other calculations such as how much money you make per month or year and how much would you like to increase this number by in order to hit your goal. This is where you come up with your central values if you have plotted all the sales figures and the money earned.
Technically, central tendency is the conclusion for a data set describing its central value or key focus area which we can use to find out more advanced information. Central tendency tells us where most values fall in the distribution of data points, specifically over the plot of sales figures from the example earlier. To find out this central tendency, statistics uses three measures namely Mean, Median and Mode - each with a different formula and purpose behind identifying where these central tendencies lie and how they could be used going forward.
Different measures of Central Tendency Mean: Mean is nothing but the simple average that we calculate in basic mathematics or statistics. It is the sum of all data points divided by the total number of data points. This gives us the quantifiable measure of central tendency, or in other words, an arithmetic average of all the values across the data set.
While this value becomes the highlight of this measure, more often than not, it exposes the definition of central tendency when it doesn’t always locate the central value in a data set. For example: (10 + 1 + 1 + 1 + 1) / 5 = 2.8 which is nowhere close to the center of the axis (5) where the highest value is 10.
Note: There is another sub-classification of mean such as geometric mean as well. Geometric mean is used in functions where variables are multiplied instead of addition. For example rate of interest, or data that follows lognormal distribution. This multiplication is followed by the root of the number of variables used to multiply.
Median: Median is that measure of central tendency which ensures that the data points are arranged in an absolutely balanced way. Put simply, think of it as a weight balance when both sides are equally distributed. And that is exactly what a median does, it arranges the data in ascending or descending order and takes the value of the one that lies in the middle.
Difference between mean and median This gives us the central tendency that is not sensitive to changes in value like we saw in the mean. For example, let’s take the median of five numbers (10, 20, 30, 40, 50) which comes out to be the third number, that is 30. Now if we change any of the two numbers of either side to large values like (10, 20, 30, 100, 1000) or (-1000, 1, 30, 40, 50), the median still remains the same. Hence, median is a better measure than mean when it comes to extreme values in outliers or skewed distributions.
Mode: Mode is another commonly used measure of central tendency which tells us about the highest occurring data point. Mode can be calculated by identifying the frequency of data points, sort of by categorizing the data set based upon its repetitions. Hence, mode is the most popular measure for finding the central tendency of categorical data, and in fact, the only type.
The value of mode is easily noticeable on a bar chart since it is the highest value. If a data set has multiple values with highest frequency, then both data points get the joint credits for mode in what is called a multi-modal distribution. Conversely, if no value returns the highest frequency, the data set does not have a mode.
Conclusion: For various kinds of distributions, we have different measures of central tendency. For a symmetrical distribution, you will notice that the mean, mode and median are at the heart or at the center of the distribution. However, for a data set skewed to the right or left, the mean shifts to the either side as well, leading to difficulty in finding the central tendency of the distribution effectively. This is where median comes handy.
As mentioned earlier, mode proves its uniqueness when calculating the central tendency for categorical data such as different flavors of ice cream. Another special case occurs when calculating the central tendency for a continuous data set when there is no mode for it. We can still deduce the frequency by locating the maximum value on a probability distribution plot.
There is a different measure of central tendency for every different data set or distribution. We’ll sum these up quickly below. Mean: Symmetrical distribution, continuous data sets. Median: Skewed distribution, Continuous data set, Ordinal data set Mode: Categorical data, Ordinal data set, Probability distribution
Measures of central tendency are key to finding out measures of variability, and diving deeper into statistical analysis which forms a core of Data Analysis. If you’re interested in picking up these subjects or developing an aptitude for the same, you can enroll for the Data Science and AI course at Skillslash which also offers a unique opportunity of real work experience at top MNCs. Get in touch with one of our counselors today by visiting https://skillslash.com/data-science-course-in-delhi