A MATLAB algorithm to determine whether an image of a character was generated by a font, or written by a human hand.
For my Quantitative Engineering Analysis 1 (ENGX2000) final project, I wanted to do something that solidified my understanding. I used the combined power of Principal Component Analysis and MATLAB to discern whether an image of a number of letter was generated by a font, or drawn by hand.
To achieve this, I utilized the TMNIST and MNIST datasets, which are commonly used to train machine learning algorithms that interpret longer strings of text. I then took both sets of image data, binarized the images, and created a combined training matrix and a test matrix, both composed of ‘image vectors’.
Using some MATLAB tricks, I projected the training matrix and the test matrix into the ‘eigenspace’ created by the first 20 eigenvectors of a matrix of training matrix. Inside this ‘eigenspace’, I compare the distances between each test image and each training image.
With these distances, I used the following logic to classify test images: if a test image is closer to more training images that are fonts, then the test image is likely a font, and vise versa.
The final algorithm classified of characters with ~90% accuracy.
After the technical work was done, I used Adobe Illustrator to make a poster to present my process and findings to my peers:
Through this project, I learned a lot about how to relate pieces of data to one another through mathematical processes. I now understand how to perform larger scale engineering analysis with much more complex data. Perhaps most importantly, this project energized me to learn more about math and how it relates to engineering.