k-means clustering | Python Unsupervised Learning -1

k-means clustering | Python Unsupervised Learning -1

In this series of articles, I will explain the topic of Unsupervised Learning and make examples of it.

Unsupervised learning is a class of machine learning techniques for discovering patterns in data.

For instance, finding the natural “clusters” of customers based on their purchase histories, or searching for patterns and correlations among the purchases and using these patterns to express the data in compressed form.

  • Unsupervised learning finds patterns in data
  • E.g., clustering customers by their purchases
  • Compressing the data using purchases  patterns (dimension reduction)

 

Supervised  vs Unsupervised learning

  • Supervised learning finds patterns for a prediction task
  • E.g., classify tumors as benign or cancerous (labels)
  • Unsupervised learning finds patterns in data
  • … but without a specific prediction  task in mind

k-means clustering | Python Unsupervised Learning -1

In this part we’ll cluster some sample data using k-means clustering.

  • Finds clusters of samples
  • Numbers of clusters must be specified
  • Implemented in sklearn (scikit-learn)

 

 

Let’s do a simple example, generate sample data and 2D points

We have to import the Iris dataset and other necessary libraries.

import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.cluster import KMeans
import pandas as pd
import numpy as np
from sklearn.datasets.samples_generator import make_blobs

 

Generate sample data

np.random.seed(0)

batch_size = 45
centers = [[1, 1], [-1, -1], [1, -1]]
n_clusters = len(centers)
X, labels_true = make_blobs(n_samples=3000, centers=centers, cluster_std=0.7)

 

Compute clustering with Means

k_means = KMeans(n_clusters=3)
k_means.fit(X)
k_means_labels = k_means.labels_
k_means_cluster_centers = k_means.cluster_centers_
k_means_labels_unique = np.unique(k_means_labels)

 

Plot result

 

colors = ['#4EACC5', '#FF9C34', '#4E9A06']
plt.figure()

for k, col in zip(range(n_clusters), colors):
my_members = k_means_labels == k
cluster_center = k_means_cluster_centers[k]
plt.plot(X[my_members, 0], X[my_members, 1], 'w',
markerfacecolor=col, marker='.')
plt.plot(cluster_center[0], cluster_center[1], 'o', markerfacecolor=col,
markeredgecolor='k', markersize=6)
plt.title('KMeans') 
plt.grid(True)
plt.show()

As a result, you will see a printout like in the picture.

Now that we have briefly got to know the k-means algorithm and explained the topic of Unsupervised learning, see you in the next article.

Thank you.

 

Evaluating a Clustering | Python Unsupervised Learning -2

About Deniz Parlak

Hi, i’m Security Data Scientist & Data Engineer at My Security Analytics. I have experienced Advance Python, Machine Learning and Big Data tools. Also i worked Oracle Database Administration, Migration and upgrade projects. For your questions [email protected]

Leave a Reply

Your email address will not be published. Required fields are marked *