{"id":1689,"date":"2021-01-19T02:16:39","date_gmt":"2021-01-19T02:16:39","guid":{"rendered":"http:\/\/192.168.31.181\/muthu\/?p=1689"},"modified":"2021-01-19T02:16:39","modified_gmt":"2021-01-19T02:16:39","slug":"understanding-interquartile-range-iqr-and-outliers","status":"publish","type":"post","link":"http:\/\/write.muthu.co\/understanding-interquartile-range-iqr-and-outliers\/","title":{"rendered":"Understanding Interquartile Range (IQR) and Outliers"},"content":{"rendered":"\n
When dealing with a large number of data, It’s a good practice to remove any outliers before further processing unless there is a good reason to keep them. Outliers in simple words are datapoint which are unusually far away from the rest of the dataset.<\/p>\n\n\n\n
For the lazy, here is the code to find outliers using IQR in a 1D array.<\/p>\n\n\n\n
# pass a 1 D array\ndef outliers_iqr(ys):\n quartile_1, quartile_3 = np.percentile(ys, [25, 75])\n iqr = quartile_3 - quartile_1\n lower_bound = quartile_1 - (iqr * 1.5)\n upper_bound = quartile_3 + (iqr * 1.5)\n return np.where((ys > upper_bound) | (ys < lower_bound))\n\n# pass a 1 D array\ndef remove_outliers(array):\n index_outliers = outliers_iqr(array)\n outliers_removed = np.delete(array, index_outliers)\n return outliers_removed<\/code><\/pre>\n\n\n\nConsider a hypothetical situation where a teacher is given the task of checking if the BMI of her students in the class is within a healthy range. Now after collecting the height and weight data of all her students, it’s important to check for any outlier which may affect the mean calculation. The outlier could also be an experimental error, not removing them may drag the BMI to a value which may not reflect the correct health of her students. <\/p>\n\n\n\n
The values removed from the total set is what we call Outliers. There are many ways to remove outliers, one of them is the IQR (Interquartile range) method. IQR gives us the middle 50% of the values from the histogram. The best way to visualize the IQR is through a box plot. Take a look at the below boxplot to get an understanding of IQR.<\/p>\n\n\n\n