Tensorflow Tutorial | Iris Classification with SGD

Tensorflow is an open source library for symbolic mathematical programming released and used by Google to build machine learning applications such as neural networks. It is one of the most popular frameworks for machine learning. The Iris dataset is a commonly used dataset for learning classification algorithms. In this article, we will create a neural network in Tensorflow to classify the Iris species and will train the network utilizing Stochastic Gradient Descent.

Get the Data

First, let’s download the Iris dataset from the UC Irvine Machine Learning Online Repository using python as shown below into a file we name raw.csv.

import pandas as pd
import numpy as np
import requests
import re
import seaborn
import matplotlib.pyplot as plt
import tensorflow as tf

#Download the dataset
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
r = requests.get(url, allow_redirects=True)
filename = "raw.csv"
open(filename, 'wb').write(r.content)

Once the file has been downloaded, load it into memory.

#load the dataset into memory
dataset = pd.read_csv('raw.csv', header=None, names=['sepal_length','sepal_width','petal_length','petal_width','species'])

The Iris dataset contains 3 species of iris along with 4 attributes for each sample that we will use to train our neural network.

To get a better idea of this dataset, let’s visualize it using the seaborn visualization library. With the pairplot function we can visualize the pairwise combination of features for each species.

#Plot the dataset
seaborn.pairplot(dataset, hue="species", size=2, diag_kind="kde")

Iris Dataset Seaborn PairPlot

Our task will be to classify the species of Iris using these 4 features.

One Hot Encoding

We must now one-hot encode the species column from text into a vector that our machine learning algorithm will understand. To do so, use the LabelBinarizer class from the scikit learn package. After calling the fit_transform method, Y will be a one hot encoded vector.

from sklearn.preprocessing import LabelBinarizer
species_lb = LabelBinarizer()
Y = species_lb.fit_transform(dataset.species.values)

Prepare the Input Features

The rest of the columns are our input features: sepal-width, sepal-height, petal-width, petal-height.

To improve gradient descent, we will normalize the values utilizing the normalize class from scikit learn. X_data variable will contain our normalized features we will use to train our neural network.

from sklearn.preprocessing import normalize
FEATURES = dataset.columns[0:4]
X_data = dataset[FEATURES].as_matrix()
X_data = normalize(X_data)

Split Train Test

The 2 vectors, X_data and Y, contains the data needed to train a neural network with Tensorflow. Our next step will be to split this data into a training and a test set in order to prevent overfitting and be able to obtain a better benchmark of our network’s performance. 

To split the data we will use the train_test_split function from sklearn model_selection package.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_data, Y, test_size=0.3, random_state=1)

Tensorflow Neural Network

Its now time to define our model in Tensorflow. The first step is to initialize important parameters that will be used in our network. Learning rate of gradient descent and the number of epochs our model will be trained in.

import tensorflow as tf

# Parameters
learning_rate = 0.01
training_epochs = 100

Our network will have two fully connected hidden layers (256 and 128 neurons respectively) and one fully connected output layer.

We also define additional important parameters such as the number of features and number of classes which are required to define the shape of our vectors

In our case, n_input will equal to 4 and n_classes will be 3.

# Neural Network Parameters
n_hidden_1 = 256 # 1st layer number of neurons
n_hidden_2 = 128 # 1st layer number of neurons
n_input = X_train.shape[1] # input shape (105, 4)
n_classes = y_train.shape[1] # classes to predict

We can now define our tensors. Starting with the input X and our objective variable Y. The weights of the hidden layers are defined in a dictionary where h1 is the vector of the first hidden layer while h2 is the vector of the second hidden layer. The out variable is the weights for the output layer. Each the hidden and output layer will also have a bias which we define in the biases dictionary.

# Inputs
X = tf.placeholder("float", shape=[None, n_input])
y = tf.placeholder("float", shape=[None, n_classes])

# Dictionary of Weights and Biases
weights = {
  'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
  'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
  'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))

biases = {
  'b1': tf.Variable(tf.random_normal([n_hidden_1])),
  'b2': tf.Variable(tf.random_normal([n_hidden_2])),
  'out': tf.Variable(tf.random_normal([n_classes]))

Forward Propagation

Forward propagation will be defined in a function which takes inputs x, and performs a dot product X dot h1. It then adds the bias term and applies the relu activation function. The result, layer_1, is then sent to second hidden layer which in turns sends its output to the output layer where a similar application is performed.

# Model Forward Propagation step
def forward_propagation(x):
    # Hidden layer1
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    layer_1 = tf.nn.relu(layer_1)
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
    layer_2 = tf.nn.relu(layer_2)

    # Output fully connected layer
    out_layer = tf.matmul(layer_2, weights['out']) + biases['out'] 
    return out_layer

# Model Outputs
yhat = forward_propagation(X)
ypredict = tf.argmax(yhat, axis=1)

The result of the foward_propagation is stored in the yhat variable. A second variable, ypredict will hold the prediction. Tensorflow’s argmax function returns the index with the largest value across axes of a tensor. The vector ypredict will hold 1 for the predicted class and 0 for the others.

Backward Propagation

To implement backward propagation, we need to define our cost function. Tensorflow provides a function called softmax_cross_entropy_with_logits which applies the softmax to our yhat variable and then calculates the cross entropy between this and the actuals. Our aim is to reduce this cost.

To reduce the cost, we will be using the gradient descent optimizer with a learning_rate parameter previously defined. The train_op will contain the objective of minimizing this cost.

# Backward propagation
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=yhat))
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
#optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)

train_op = optimizer.minimize(cost)

Train our Neural Network

With our placeholders in place, we now run a tensorflow session to train our neural network.

The code snippet for this can be seen below. We will run over the number of train_epochs defined and for each sample we apply forward and backpropagation steps. 

After each epoch run is complete, the training_set and test_set accuracy is calculated and displayed.

# Initializing the variables
init = tf.global_variables_initializer()

from datetime import datetime
startTime = datetime.now()

with tf.Session() as sess:
    for epoch in range(training_epochs):
        #Stochasting Gradient Descent
        for i in range(len(X_train)):
            summary = sess.run(train_op, feed_dict={X: X_train[i: i + 1], y: y_train[i: i + 1]})
        train_accuracy = np.mean(np.argmax(y_train, axis=1) == sess.run(ypredict, feed_dict={X: X_train, y: y_train}))
        test_accuracy  = np.mean(np.argmax(y_test, axis=1) == sess.run(ypredict, feed_dict={X: X_test, y: y_test}))
        print("Epoch = %d, train accuracy = %.2f%%, test accuracy = %.2f%%" % (epoch + 1, 100. * train_accuracy, 100. * test_accuracy))
        #print("Epoch = %d, train accuracy = %.2f%%" % (epoch + 1, 100. * train_accuracy))

print("Time taken:", datetime.now() - startTime)


Congratulations! You have now trained your neural network with Tensorflow. If you have followed other tutorials on training neural networks with other libraries such as Keras, you’ll notice that Tensorflow provides you with much additional functionality at a price of complexity. Tensorflow is really powerful and we are only touching the basic components in this article.