- Details
- Parent Category: Programming Assignments' Solutions
We Helped With This Python Programming Assignment: Have A Similar One?
Short Assignment Requirements
Assignment Description
Last updated on Wednesday, September 27, 2017
2017 IFN680 - Assignment Two (Siamese network)
Assessment information
• Code and report submission due on Monday 30th October, 08.30am
• Use Blackboard to submit your work
• Group size: three people per submission. Smaller group sizes allowed (1 or 2 people).
Overview
• You will implement a deep neural network classifier to predict whether two images belong to the same class. The dataset you will use is a set of images of handwritten digits.
• The approach you are asked to follow is quite generic and can be apply to problems where we seek to determine whether two inputs belong to the same equivalence class.
• You are provided with scaffolding code that you will need to complete with your own functions.
• You will also perform experiments and report on your results.
Introduction
Despite impressive results in object classification, verification and recognition, most deep neural network based recognition systems become brittle when the view point of the camera changes dramatically. Robustness to geometric transformations is highly desirable for applications like wild life monitoring where there is no control on the pose of the objects of interest. The images of different objects viewed from various observation points define equivalence classes where by definition two images are said to be equivalent if they are views from the same object.
These equivalence classes can be learned via embeddings that map the input images to vectors of real numbers. During training, equivalent images are mapped to vectors that get pulled closer together, whereas if the images are not equivalent their associated vectors get pulled apart.
Background
Common machine learning tasks like classification and recognition involve learning an appearance model. These tasks can be interpreted and even reduced to the problem of learning manifolds from a training set. Useful appearance models create an invariant representation of the objects of interest under a range of conditions. A good representation should combine invariance and discriminability. For example, in facial recognition where the task is to compare two images and determine whether they show the same person, the output of the system should be invariant to the pose of the heads. More generally, the category of an object contained in an image should be invariant to viewpoint changes.
This assignment borrows ideas we developed for a manta ray recognition system. The motivation for our research work is the lack of fully automated identification systems for manta rays. The techniques developed for such systems can also potentially be applied to other marine species that bear a unique pattern on their surface. The task of recognizing manta rays is challenging because of the heterogeneity of photographic conditions and equipment used in acquiring manta ray photo ID images like those in the figures below.
Two images of the same Manta ray
Many of those pictures are submitted by recreational divers. For those pictures, the camera parameters are generally not known. The state of the art for manta ray recognition is a system that requires user input to manually align and normalize the 2D orientation of the ray within the image. Moreover, the user has to select a rectangular region of interest containing the spot pattern. The images have also to be of high quality. In practice, marine biologists still prefer to use their own decision tree that they run manually.
In order to develop robust algorithms for recognizing manta spot patterns, a research student and I have considered the problem of recognizing artificially generated patterns subjected to projective transformations that simulate changes in the camera view point.
Artificial data allowed us to experiment with a large amount of patterns and compare different network architectures to select the most suitable for learning geometric equivalence. Our experiments have demonstrated that Siamese[1] convolutional neural networks are able to discriminate between patterns subjected to large homographic transformations.
In this assignment, you will work with a simpler dataset. Namely the MNIST dataset. You will build a classifier to predict whether two images are warped views from digits of the same class or not.
Learning equivalence classes
A Siamese network consists of two identical subnetworks that share the same weights followed by a distance calculation layer. The input of a Siamese network is a pair of images Pi and Pj. If the two images are deemed from the same equivalence classes, the pair is called a positive pair, whereas for a pair of images from different equivalence classes, the pair is called a negative pair.
The input images Pi and Pj are fed to the twin subnetworks to produce two vector representations f(Pi) and f(Pj) that are used to calculate a proxy distance. The training of a Siamese network is done on a collection of positive and negative pairs. Learning is performed by optimizing a contrastive loss function (see code documentation). The aim is to minimize the distance between a pair of images from the same equivalence class while maximizing the distance between a pair of different equivalence classes.
Your tasks
• You are provided with scaffolding code and an example of a Siamese network based on a multi-layer perceptron network. You need to familiarize yourself with this code, then adapt it to your need.
• Write code to create a Siamese network based on a convolutional network with at least 3 convolutional layers, and 2 fully connected layers. No need to use more than 5 convolutional layers!
• Create a dataset of 100000 warped images using the provided function random_deform. Use the call im2 = assign2_utils.random_deform(im1, 45, 0.3) to warp the image im1 with a rotation of at most 45 degrees and a projective transformation with a “strength” of 0.3.
• Find a suitable architecture for the base convolutional neural network of your Siamese network.
• Train your Siamese network on the original dataset (without warping) and report the performance of your network.
• Train your Siamese network on the warped dataset and report the performance of your network.
• Investigate whether starting training the network on easier pairs improve the classifier. That is, start training the network on images that have a deformation significantly smaller than 45 degree and with the strength parameter smaller than 0.3.
• Describe your findings in the report. Use tables and figures to support your arguments.
Submission
You should submit via Blackboard a zip file containing A report in pdf format strictly limited to 8 pages in total.
• explain clearly your methodology for your experiments
• present your experimental results using tables and figures
Your Python file my_submission.py
Marking Guide Focus
• Report:
• Structure (sections, page numbers), grammar, no typos.
• Clarity of explanations.
• Figures and tables (use for explanations and to report performance).
• Code quality:
• Readability, meaningful variable names.
• Proper use of Python constructs like numpy arrays, dictionaries and list comprehension.
• Header comments in classes and functions.
• Function parameter documentation.
• In-line comments.
• Experiments
• Soundness of the methodology
• Evidence based discussion/conclusion
Final Remarks
• Do not underestimate the workload. Start early. You are strongly encouraged to ask questions during the practical sessions.
• Email questions to ...
[1] Siamese network are defined in the next section.
Assignment Description
Marking Guide and Criterion List
Assignment Two 2017 IFN680
• Report: 10 marks
• Structure (sections, page numbers), grammar, no typos.
• Clarity of explanations.
• Figures and tables (use for explanations and to report performance).
Levels of Achievement
10 Marks | 7 Marks | 5 Marks | 3 Marks | 1 Mark |
Report written at the highest professional standard with respect to spelling, grammar, formatting, structure, and language terminology. | Report is verywell written and understandabl e throughout, with only a few insignificant presentation errors. | The report is generally wellwritten and understandabl e but with a few small presentation errors that make one of two points unclear. Clear figures and tables. | Large parts of the report are poorly-written, making many parts difficult to understand. Use of sections with proper section titles. No figures or tables. | The entire report is poorly-written and/or incomplete and/or impossible to understand. The report is in pdf format. |
To get “i Marks”, the report needs to satisfy all the positive items and none of the negative items of the columns “j Marks” for all j<i. For example, if your report is not in pdf format, you will not be awarded more than 1 mark.
• Code quality: 10 marks
• Readability, meaningful variable names.
• Proper use of Python constructs like numpy arrays, dictionaries and list comprehension.
• Header comments in classes and functions.
• Function parameter documentation.
• In-line comments.
Levels of Achievement
10 Marks | 7 Marks | 5 Marks | 3 Marks | 1 Mark |
Code is generic. Minimal changes would be needed to run same experiments on a different dataset. | Proper use of numpy array operations. Avoid unnecessary loops. Useful in-line comments. Code structured so that it is straightforward to repeat the experiments | No magic numbers (that is, all numerical constants have been assigned to variables). Appropriate use of auxiliary functions. Each function parameter documented (including type and shape) | Header comments with instructions on how to run the code to repeat the experiments. | Code looks like a random spaghetti plate |
To get “i Marks”, the report needs to satisfy all the positive items and none of the negative items of the columns “j Marks” for all j<i.
• Experiments 20 marks
Levels of Achievement
20 Marks | 15 Marks | 10 Marks | 5 Marks | 0 Mark |
Successfully train a CNN based Siamese network on the warped dataset in two phases. First small warps, then larger deformations. | The recommendations are supported by references to tables and/or figures. Successfully train a CNN based Siamese network on the warped dataset | Methodology, experiments and recommendations are clear. Successfully train a CNN based Siamese network successfully on the original (not warped) dataset | Partial description of the experiments. Critical information is missing to repeat the experiments. | No experiments described in the report. |
To get “i Marks”, the report needs to satisfy all the positive items and none of the negative items of the columns “j Marks” for all j<i.
Assignment Code
'''
This module contains functions
- to load the original image dataset
- to generate random homographies
- to warp randomly images with random homographies
'''
import numpy as np
import random
from tensorflow.contrib import keras
from tensorflow.contrib.keras import backend as K
from skimage import transform
#------------------------------------------------------------------------------
def load_dataset():
'''
Load the dataset, shuffled and split between train and test sets
and return the numpy arrays x_train, y_train, x_test, y_test
The dtype of all returned array is uint8
@returnInstructions:
x_train, y_train, x_test, y_test
'''
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
return x_train, y_train, x_test, y_test
#------------------------------------------------------------------------------
def random_homography(variation, image_side):
'''
Generate a random homography.
The large the value of variation the more deformation is applied.
The homography is defined by 4 random points.
@param
variation: percentage (in decimal notation from 0 to 1)
relative size of a circle region where centre is projected
image_side:
length of the side of an input square image in pixels
@return
tform: object from skimage.transfrm
'''
d = image_side * variation
top_left = (random.uniform(-0.5*d, d), random.uniform(-0.5*d, d)) # Top left corner
bottom_left = (random.uniform(-0.5*d, d), random.uniform(-0.5*d, d)) # Bottom left corner
top_right = (random.uniform(-0.5*d, d), random.uniform(-0.5*d, d)) # Top right corner
bottom_right =(random.uniform(-0.5*d, d), random.uniform(-0.5*d, d)) # Bottom right corner
tform = transform.ProjectiveTransform()
tform.estimate(np.array((
top_left,
(bottom_left[0], image_side - bottom_left[1]),
(image_side - bottom_right[0], image_side - bottom_right[1]),
(image_side - top_right[0], top_right[1])
)), np.array((
(0, 0),
(0, image_side),
(image_side, image_side),
(image_side, 0)
)))
return tform
#------------------------------------------------------------------------------
def random_deform(image, rotation, variation):
'''
Apply a random warping deformation to the in
'''
image_side = image.shape[0]
assert image.shape[0]==image.shape[1]
cval = 0
rhom = random_homography(variation, image_side)
image_warped = transform.rotate(
image,
random.uniform(-rotation, rotation),
resize = False,
mode='constant',
cval=cval)
image_warped = transform.warp(image_warped, rhom, mode='constant', cval=cval)
return image_warped
#------------------------------------------------------------------------------
Assignment Code
'''
2017 IFN680 Assignment Two
Scaffholding code to get you started for the 2nd assignment.
'''
import random
import numpy as np
#import matplotlib.pyplot as plt
from tensorflow.contrib import keras
from tensorflow.contrib.keras import backend as K
import assign2_utils
#------------------------------------------------------------------------------
def euclidean_distance(vects):
'''
Auxiliary function to compute the Euclidian distance between two vectors
in a Keras layer.
'''
x, y = vects
return K.sqrt(K.maximum(K.sum(K.square(x - y), axis=1, keepdims=True), K.epsilon()))
#------------------------------------------------------------------------------
def contrastive_loss(y_true, y_pred):
'''
Contrastive loss from Hadsell-et-al.'06
http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
@param
y_true : true label 1 for positive pair, 0 for negative pair
y_pred : distance output of the Siamese network
'''
margin = 1
# if positive pair, y_true is 1, penalize for large distance returned by Siamese network
# if negative pair, y_true is 0, penalize for distance smaller than the margin
return K.mean(y_true * K.square(y_pred) +
(1 - y_true) * K.square(K.maximum(margin - y_pred, 0)))
#------------------------------------------------------------------------------
def compute_accuracy(predictions, labels):
'''
Compute classification accuracy with a fixed threshold on distances.
@param
predictions : values computed by the Siamese network
labels : 1 for positive pair, 0 otherwise
'''
# the formula below, compute only the true positive rate]
# return labels[predictions.ravel() < 0.5].mean()
n = labels.shape[0]
acc = (labels[predictions.ravel() < 0.5].sum() + # count True Positive
(1-labels[predictions.ravel() >= 0.5]).sum() ) / n # True Negative
return acc
#------------------------------------------------------------------------------
def create_pairs(x, digit_indices):
'''
Positive and negative pair creation.
Alternates between positive and negative pairs.
@param
digit_indices : list of lists
digit_indices[k] is the list of indices of occurences digit k in
the dataset
@return
P, L
where P is an array of pairs and L an array of labels
L[i] ==1 if P[i] is a positive pair
L[i] ==0 if P[i] is a negative pair
'''
pairs = []
labels = []
n = min([len(digit_indices[d]) for d in range(10)]) - 1
for d in range(10):
for i in range(n):
z1, z2 = digit_indices[d][i], digit_indices[d][i + 1]
pairs += [[x[z1], x[z2]]]
# z1 and z2 form a positive pair
inc = random.randrange(1, 10)
dn = (d + inc) % 10
z1, z2 = digit_indices[d][i], digit_indices[dn][i]
# z1 and z2 form a negative pair
pairs += [[x[z1], x[z2]]]
labels += [1, 0]
return np.array(pairs), np.array(labels)
#------------------------------------------------------------------------------
def simplistic_solution():
'''
Train a Siamese network to predict whether two input images correspond to the
same digit.
WARNING:
in your submission, you should use auxiliary functions to create the
Siamese network, to train it, and to compute its performance.
'''
def create_simplistic_base_network(input_dim):
'''
Base network to be shared (eq. to feature extraction).
'''
seq = keras.models.Sequential()
seq.add(keras.layers.Dense(128, input_shape=(input_dim,), activation='relu'))
seq.add(keras.layers.Dropout(0.1))
seq.add(keras.layers.Dense(128, activation='relu'))
seq.add(keras.layers.Dropout(0.1))
seq.add(keras.layers.Dense(128, activation='relu'))
return seq
# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
# load the dataset
x_train, y_train, x_test, y_test = assign2_utils.load_dataset()
# Example of magic numbers (6000, 784)
# This should be avoided. Here we could/should have retrieve the
# dimensions of the arrays using the numpy ndarray method shape
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255 # normalized the entries between 0 and 1
x_test /= 255
input_dim = 784 # 28x28
#
epochs = 20
# create training+test positive and negative pairs
digit_indices = [np.where(y_train == i)[0] for i in range(10)]
tr_pairs, tr_y = create_pairs(x_train, digit_indices)
digit_indices = [np.where(y_test == i)[0] for i in range(10)]
te_pairs, te_y = create_pairs(x_test, digit_indices)
# network definition
base_network = create_simplistic_base_network(input_dim)
input_a = keras.layers.Input(shape=(input_dim,))
input_b = keras.layers.Input(shape=(input_dim,))
# because we re-use the same instance `base_network`,
# the weights of the network
# will be shared across the two branches
processed_a = base_network(input_a)
processed_b = base_network(input_b)
# node to compute the distance between the two vectors
# processed_a and processed_a
distance = keras.layers.Lambda(euclidean_distance)([processed_a, processed_b])
# Our model take as input a pair of images input_a and input_b
# and output the Euclidian distance of the mapped inputs
model = keras.models.Model([input_a, input_b], distance)
# train
rms = keras.optimizers.RMSprop()
model.compile(loss=contrastive_loss, optimizer=rms)
model.fit([tr_pairs[:, 0], tr_pairs[:, 1]], tr_y,
batch_size=128,
epochs=epochs,
validation_data=([te_pairs[:, 0], te_pairs[:, 1]], te_y))
# compute final accuracy on training and test sets
pred = model.predict([tr_pairs[:, 0], tr_pairs[:, 1]])
tr_acc = compute_accuracy(pred, tr_y)
pred = model.predict([te_pairs[:, 0], te_pairs[:, 1]])
te_acc = compute_accuracy(pred, te_y)
print('* Accuracy on training set: %0.2f%%' % (100 * tr_acc))
print('* Accuracy on test set: %0.2f%%' % (100 * te_acc))
#------------------------------------------------------------------------------
#------------------------------------------------------------------------------
#------------------------------------------------------------------------------
if __name__=='__main__':
simplistic_solution()
# ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
# CODE CEMETARY
Assignment Code
'''
A short script to illustrate the warping functions of 'assign2_utils'
'''
#import numpy as np
import matplotlib.pyplot as plt
from tensorflow.contrib import keras
#from tensorflow.contrib.keras import backend as K
import assign2_utils
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
im1 = x_train[20]
plt.imshow(im1,cmap='gray')
im2 = assign2_utils.random_deform(im1,45,0.3)
plt.figure()
plt.imshow(im2,cmap='gray')
plt.show()