in Artificial Intelligence, Data science

K-Means on Iris Dataset

Read my previous post to understand how K-Means algorithm works. In this post I will try to run the K-Means on Iris dataset to classify our 3 classes of flowers, Iris setosa, Iris versicolor, Iris virginica (our classess) using the flowers sepal-length, sepal-width, petal-length and petal-width (our features)

Getting data:

import numpy as np  
import matplotlib.pyplot as plt  
import pandas as pd

url = ""
#the imported dataset does not have the required column names so lets add it
colnames = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']
irisdata = pd.read_csv(url, names=colnames)

describe the data:


Converting the class names into numerical categories for analysis.

irisdata['Class'] = pd.Categorical(irisdata["Class"])
irisdata["Class"] = irisdata["Class"]

Preparing our dataset:

X = irisdata.values[:, 0:4]
y = irisdata.values[:, 4]

Running K-Means on it:

from sklearn.cluster import KMeans

# Number of clusters
kmeans = KMeans(n_clusters=3)
# Fitting the input data
kmeans =
# Getting the cluster labels
labels = kmeans.predict(X)
# Centroid values
centroids = kmeans.cluster_centers_

Checking the clustering accuracy of our program:

from sklearn.metrics import classification_report

target_names = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']


The output of classification looks like this:

You can see in the classification report that, 91% of our data was predicted accurately. Thats pretty good for an unsupervised algorithm.




Write a Comment



  • Understanding the Classification report in sklearn – Muthukrishnan

    […] of our predictions for classification algorithms. A sample report is shown below generated in our previous post where we ran K-Means on Iris […]