IDIOT DEVELOPER

Iris Data Set Classification using TensorFlow Multilayer Perceptron
https://github.com/nikhilroxtomar/Iris-Data-Set-Classification-using-TensorFlow-MLP
DOWNLOAD

Iris Data Set is one of the basic data set to begin your path towards Neural Networks. Today Neural Network is one of the most trending machine learning algorithms, as they perform really well then any other algorithms in the field of machine learning. So everyone should begin learning about neural network. If you want to learn, then this is the right place for you.

 

Iris Dataset

In a Neural Network a dataset is really important, as its the dataset that determine what the neural network is going to learn.

 

The Iris Dataset is a multivariate data set which consist of three kinds of flowers. The dataset contain four properties of the flower, and the name of the flower.

Like –

 
Sepal Length Sepal Width Petal Length Petal Width Species
5.1 3.5 1.4 0.2 Iris-setosa
4.9 3.0 1.4 0.2 Iris-setosa
7.0 3.2 4.7 1.4 Iris-versicolor

 

We are going to divide the dataset in two catergories – training data, testing data.

 

Training Data is used to train the neural network, while the Testing Data is used to test the neural network for its accuracy.

 

We have a single file containing the all the data. For my own easiness , I have divided the data into training and testing data, and put them into two separate file. The iris.train contains the data for training the neural network, and the other file i.e. iris.test contains the data for testing purpose.

 

Converting Dataset to a Useful Format

First we are going the dataset in a format in which we can be used in the training and testing of the neural network. We are going to convert the dataset into a multidimensional array.

We are going to have four arrays –

 

Training – train_X, train_Y

Testing – test_X, test_Y

 

Data in the file is present in the following format –

 


5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa

 

We are going to put the first four parts into a separare array (train_X or test_X), and the species name into  a separate array (train_Y or test_Y).  This procedure is really simple, just split each line and place the first four values into the specific array. Now the problem is of the name of the species, we can’t store it in the same format. So we are going to convert it into an array.

 


Iris-setosa - [1,0,0]
Iris-versicolor - [0,1,0]
Iris-virginica - [0,0,1]

 

The label_encode function convert the name of the species into an appropriate array.

 


def label_encode(label):
  val=[]
  if label == "Iris-setosa":
    val = [1,0,0]
  elif label == "Iris-versicolor":
    val = [0,1,0]
  elif label == "Iris-virginica":
    val = [0,0,1]
  return val

 

Now the data_encode function is going to give you the X, Y of the training and the testing dataset. It takes only one parameter i.e. the name of the file.

 


def data_encode(file):
  X = []
  Y = []
  train_file = open(file, 'r')
  for line in train_file.read().strip().split('\n'):
    line = line.split(',')
    X.append([line[0], line[1], line[2], line[3]])
    Y.append(label_encode(line[4]))
  return X, Y

 

Parameter of the Neural Network

Now we have the both training and testing data. We will start our next step – defining the various parameters of the Neural Network .

 


#hyperparameter
learning_rate = 0.01
training_epochs = 2
display_steps = 1

 

As we are using a MLP (Multilayer Perceptron), we have three layers with different number of the parameter.

 


#Network parameters
n_input = 4
n_hidden = 10
n_output = 3

 

Defining the Weights and Biases for the Neural Network

 


#Weights and Biases
weights = {
  "hidden" : tf.Variable(tf.random_normal([n_input, n_hidden]), name="weight_hidden"),
  "output" : tf.Variable(tf.random_normal([n_hidden, n_output]), name="weight_output")
}

bias = {
  "hidden" : tf.Variable(tf.random_normal([n_hidden]), name="bias_hidden"),
  "output" : tf.Variable(tf.random_normal([n_output]), name="bias_output")
}	

 

Neural Network Model

 


def model(x, weights, bias):
  layer_1 = tf.add(tf.matmul(x, weights["hidden"]), bias["hidden"])
  layer_1 = tf.nn.relu(layer_1)

  output_layer = tf.matmul(layer_1, weights["output"]) + bias["output"]
  return output_layer

 

Training of the Neural Network

As the dataset is really small, we can train it on a CPU or GPU. On my i5-7400 CPU it hardly takes 2 seconds, with 97% accuracy.

 

 

First of all we will define the model for the neural network, and provide it with the input placeholder, weights, and biases. To calculate the loss we uses the softmax cross entropy. To minimize the cost we have used the AdamOptimizer. After that we train it on the 2000 training epochs. After finishing the optimization we calculate the accuracy of the neural network by testing it on the testing dataset.

 


#Define model
pred = model(X, weights, bias) 

#Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)

#Initializing global variables
init = tf.global_variables_initializer()

with tf.Session() as sess:
  sess.run(init)

  for epoch in range(training_epochs):
    _, c = sess.run([optimizer, cost], feed_dict={X: train_X, Y: train_Y})
    if(epoch + 1) % display_steps == 0:
      print "Epoch: ", (epoch+1), "Cost: ", c
	
  print("Optimization Finished!")
	
  test_result = sess.run(pred, feed_dict={X: train_X})
  correct_pred = tf.equal(tf.argmax(test_result, 1), tf.argmax(train_Y, 1))

  accuracy = tf.reduce_mean(tf.cast(correct_pred, "float"))
  print "Accuracy:", accuracy.eval({X: test_X, Y: test_Y})

 

Video

 

9 comments on “Iris Data Set Classification using TensorFlow Multilayer Perceptron”

  • Fabiana Bacon says:

    Hi! Thanks for your beautiful code!
    Do you know how i do it without tensorflow using Anaconda/Python?
    []’s

  • Alexey says:

    Very nice and simple intro with this popular dataset, thanks for sharing.

  • Jon Reade says:

    Really nice, clear example code Nikhil, great to see 🙂

  • skptricks says:

    Great work… Thanks for this post.

  • 99Shantae says:

    Hi admin, i must say you have hi quality articles here.
    Your blog can go viral. You need initial traffic boost only.
    How to get it? Search for; make your content go viral Wrastain’s tools

  • Md Nahiduzzaman says:

    Hello, Here is my code and I have found this error :

    ValueError: Cannot feed value of shape (208, 3) for Tensor ‘Placeholder_1:0’, which has shape ‘(?, 5)’

    my dataset:

    the code:

    import tensorflow as tf
    import numpy as np
    import time

    start_time = time.time()
    def label_encode(label):
    val=[]
    if label == “0”:
    val = [0,0,0]
    elif label == “1”:
    val = [0,0,1]
    elif label == “2”:
    val = [0,1,0]
    elif label == “3”:
    val = [0,1,1]
    elif label == “4”:
    val = [1,0,0]
    return val
    def data_encode(file):
    X = []
    Y = []
    train_file = open(file, ‘r’)
    for line in train_file.read().strip().split(‘\n’):
    line = line.split(‘,’)
    X.append([line[0], line[1], line[2], line[3], line[4], line[5], line[6], line[7], line[8], line[9], line[10], line[11], line[12]])
    Y.append(label_encode(line[13]))
    return X, Y

    #Defining a Multilayer Perceptron Model
    def model(x, weights, bias):
    layer_1 = tf.add(tf.matmul(x, weights[“hidden”]), bias[“hidden”])
    layer_1 = tf.nn.relu(layer_1)

    output_layer = tf.matmul(layer_1, weights[“output”]) + bias[“output”]
    return output_layer

    #Training and Testing Data
    train_X , train_Y = data_encode(‘hd.train’)
    test_X , test_Y = data_encode(‘hd.test’)

    #hyperparameter
    learning_rate = 0.01
    training_epochs = 2000
    display_steps = 200

    #Network parameters
    n_input = 13
    n_hidden = 10
    n_output = 5

    #Graph Nodes
    X = tf.placeholder(“float”, [None, n_input])
    Y = tf.placeholder(“float”, [None, n_output])

    #Weights and Biases
    weights = {
    “hidden” : tf.Variable(tf.random_normal([n_input, n_hidden]), name=”weight_hidden”),
    “output” : tf.Variable(tf.random_normal([n_hidden, n_output]), name=”weight_output”)
    }

    bias = {
    “hidden” : tf.Variable(tf.random_normal([n_hidden]), name=”bias_hidden”),
    “output” : tf.Variable(tf.random_normal([n_output]), name=”bias_output”)
    }

    #Define model
    pred = model(X, weights, bias)

    #Define loss and optimizer
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=Y))
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)

    #Initializing global variables
    init = tf.global_variables_initializer()

    with tf.Session() as sess:
    sess.run(init)

    for epoch in range(training_epochs):
    _, c = sess.run([optimizer, cost], feed_dict={X: train_X, Y: train_Y})
    if(epoch + 1) % display_steps == 0:
    print (“Epoch: “, (epoch+1), “Cost: “, c)

    print(“Optimization Finished!”)

    test_result = sess.run(pred, feed_dict={X: train_X})
    correct_pred = tf.equal(tf.argmax(test_result, 1), tf.argmax(train_Y, 1))

    accuracy = tf.reduce_mean(tf.cast(correct_pred, “float”))
    print (“Accuracy:”, accuracy.eval({X: test_X, Y: test_Y}))

    end_time = time.time()

    print (“Completed in “, end_time – start_time , ” seconds”)

    Please solve this?

    • Nikhil Tomar says:

      You need to change the n_output to 3
      n_output = 3
      As the label are having a shape of 3.

      OR

      you can make your label have a shape 5.
      label = [0, 0, 0, 0, 0]

      This problem caused as your label size and placeholder size doesn’t match.

Leave a Reply

Your email address will not be published. Required fields are marked *