Table of Contents
- Abstract
- INTRODUCTION
- LITERATURE REVIEW
- PROPOSED SOLUTION
- IMAGE SEGMENTER
- NEURAL NETWORK
- Artificial Neural Network
- Example of MNIST Image
- Training Model
- Processed Output
- TOOLS REQUIRED
- A. MNIST DATASET
- Examples of Image set
- B. JUPYTER NOTEBOOK
- Jupyter Notebook Setup for Web application
- C. ANACONDA PYTHON
- Numpy is basically used for Python Programming languages
- E. ANN
- Examples of Input, Hidden and Output Layers with Nodes
- F. ARTIFICIAL INTELLIGENCE
- FLOWCHART DIAGRAM
- CONCLUSIONS
- Examples of ANN Forward Pass
- Examples of ANN Backward Propagation
- Examples of MNIST Simulation result
Abstract
Modern computers are difficult to read handwritten numbers because the handwriting of each person is special and distinctive. In the modern age of digital technology, however, everybody is moving. Handwritten books and papers need to be re-typed into computer systems. Thanks to computer vision methods f or Artificial Intelligence, the road is simpler. We have created an Artificial Neural Network System which can read a handwritten number of any kind with an accuracy of more than 93 percent.
Our project includes both research and practical implementation. After successful completion of the code, the Neural Network test and learn with the aid of MNSIT data we get a machine that can read and process the images accurately 93 percent and is assisted by the graph. We have stored this framework in a file and can now use it and other libraries such as opencv, etc. In just a few clicks we can read and save the processed results in a CSV file and get the data in real time. The idea is still in its early stages. When done, it will help save a great deal of human effort.
INTRODUCTION
Handwritten digit recognition is an important issue in the identification of optical character and can be used as a test case for pattern recognition theories and machine learn algorithms. Hand-written numbers are pre-processed, like segmentation and normalisation, so that researchers can equate their techniques to a standard recognition test Basis and workload reduction [1].
The modern machines have trouble reading handwritten numbers, since the writing of every person is different and special. In modern times, however, everyone is heading towards digital technology. Handwritten ledgers and notes must be re-entered into computer systems. Thanks to computer vision technologies for Artificial Intelligence, the journey has become smoother.
So we created an Artificial Neural Network System that reads every manual number with more than 99.5 percent accuracy. This program would allow teachers to quickly upload student marks to the online database. In addition, we will introduce this program in government and private sector financial and accounting departments for the upload of numerical queries. This also helps to reduce human error while uploading data based on numbers. We want to reduce human error, effort and time.
It is not only valid, as the main form of teaching in schools and universities, there are still lacquered papers which must be digitized each year. And that only schools and colleges think of the other sectors like the generation of Aadhar. Billions of handwritten data had to be loaded into the machine.
It takes a lot of human effort and time to convert hardcopy into conventional softcopy. Our aim is to accurately predict the handwritten images via artificial neural networks.
LITERATURE REVIEW
For the classification of handwritten numbers, several methods are used, such as low-level picture representation methods which view handwritten numerical images as a group of small characteristics, such as texture, form , size, color etc. and methods based on intermediate-level visual structures for image classification purposes. The use of deep neural networks and neural networks to achieve a picture representation is now trending. Such architectures allow us to extract features from a specific layer of a trained neural network so that extract feature maps are used as numerical image representations.
There are a wide number of publications related to neural network image processing. Our work concerns this form of study, in which ANN is used to identify images. Image classification in the fashion domain has several advantages and applications and has various research works have been published about it. [2].
Handwriting recognition is one of researchers’ favorite topics, because every person in this world has their writing style. The machine can recognize and understand written numbers, but handwritten work is very hard for the machine to understand because every person’s handwriting is special and different. The system’s key purpose is to reduce human effort.
Data Entry workers spend hundreds of hours on writing handwritten information on the computer. It is a timely task, and it requires great precision and high-speed typing, because millions of records are required. A large amount of money is spent by different organizations to convert their documents from one format to another [4].
Deep learning and machine learning plays an important role in computer technology and artificial intelligence. With the use of deep learning and machine learning, human effort can be reduced in identifying, learning, predictions and several more fields. Thus we developed an Artificial Neural Network System that can automatically convert handwritten images to Digital Format with and accuracy of more than 93 percent
PROPOSED SOLUTION
IMAGE SEGMENTER
Our program’s most important feature is Image Segmenter. We have inserted an image segmentation in our software to give it a more realistic look over the software than other MNSIT programs. Since the MNSIT datasets only contain single digit numbers from 0 to 9, multi digit numbers cannot be processed. We have introduced an image segmenter system for processing multi-digit numbers. Since it is easier to distinguish a single segmented digit compared to the multi-digit sting recognition and segmentation. It is also known as an example of Hello World machine learning.
We will give you a brief overview of how our segmenter functions.
The handwritten number identification process passes through three stages preprocessing, image segmentation into individual digits and each digit identifying.
NEURAL NETWORK
Digit dataset ANN is developed into 3 layers. The first layer is the input layer, followed by an output layer. It is a connected network. Figure 9 depicts ANN.
Artificial Neural Network
The MNIST dataset images have 28*28 pixels size that are flattened to 784 tensor array.
Example of MNIST Image
Each of these values is assigned to a single input layer neuron. The input layer is connected to the hidden layer. Initial weights are initialized using random normal distribution. Weights are between 0 and 1. Neural network also applies bias weight to both layers. Bias weight is shown in Fig as 10. For the input layer.
The input layer then has 784 neurons attached to a secret layer of 75 neurons. Input layer weight matrix representation is 20X784. The number 20 is the size of the batch. The weights are not updated after each training example, but after 20 training examples. These 20 examples are called batch. This is how 3000 batches are generated from our 60,000 training examples.
The input is then propagated to a hidden layer of 75 hidden neurons. There is no thumb rule for the number of hidden neurons to be selected. It varies from one problem to the next. After a few choices, this 75 neuron is the best choice for a hidden layer. The hidden layer has a matrix equivalent in size 784X75. The hidden layer also has a bias weight represented by h0 in Fig.9 Using simple matrix multiplication of two 20X784 and 784X75 matrices, a 20X75 output matrix is generated. An activation function applies to all output matrix elements. The sigmoid function used here has a range of-1 to 1 for input values [20].
The hidden layer is then connected to the output layer containing 10 neurons, which is shown in Fig.10 as 0 to 9. Those are class names or real numbers. The output layer weights 75X10 neurons. A 20X10 output is created using matrix multiplication of size 20X75 and 75X10, which is a probability for all these numbers generated for 20 training example. The Soft Max activation function is then used in the range of 0 to 1 and the total sum of 1. For a training example , the highest probability is selected as 1 and rest is converted to 0. Because of having an actual number tag, our training examples’ output labels are also translated to 0 and 1 values, one-hot encoding.[6].
Following the expected outcome for the first batch of 20 training examples. The error, i.e. the difference between the actual output and the predicted output, is calculated using the mean squared error loss function. The error is then propagated across the network to change weights. Different learning levels are chosen to see which model best fits, and ANN does not adhere to local minima or overshoot global minimum.
After training the neural network with all our training examples and checking the test set. The entire ANN is educated repeatedly bypassing the entire dataset again and again, a total of 350 times named multiple epochs. With each epoch, neural network accuracy should increase, and loss should occur until the ANN hits a plateau. The projected outputs are then translated back to numerical digits for human readability.
Training Model
We also added a special feature to our software that can help significantly reduce processing time. We all know a system ‘s training takes a lot of time and money, including system computing power. For example, processing an image using MNSIT will take around 10-15 minutes for a normal system to deliver the result. To overcome this error, we put another approach to saving the resulting code into a pickle file.
A pickle file is a file used to de-serialize objects in python. Every python object can be converted to pickle. The basic idea behind it was that the pickle file contained all the information needed to reconstruct the same object in another python program. That in our case our qualified digit recognition program.
This is the following line of code where the program requests input from the user if they want to train the model or not.
If the line has been set to True. The program will run from the beginning and retrain the model that processes all the MNIST images again, and relearn the whole process and accuracy, and then save the object in the pickle file that can be used for future processes.
If you set the line to False. The software does not run from the outset. Using the last saved pickle and recognize the digits based on that file.
We will train the model once before beginning each loop. This helps to get the right results.
Processed Output
In this part, we will know what happens after the result is declared, i.e. when the system processes the final image. The software then asks for user input to save the data in the csv format. IF the user gives permission, the result is saved with its name and value in the excel file. The image is now stored in the excel file so we can open it anytime we want and adapt it to our needs. It is helpful for our future references if we need any that we can easily go to the saved file and take a look at whatever we need to find through it. It is less time-consuming and efficient.
And finally, all we need is more powerful and less time-consuming functionality. Naturally, it keeps the files separated as the file is saved by default with its name and meaning. The file saved in the output as in one column the file path is saved and in the other column the file value is saved. This function is more streamlined to look and understand. But as this feature is not a requirement, it is an optional feature, and there is no hard and quick to use. It is up to us to use it or not.
TOOLS REQUIRED
A. MNIST DATASET
The MNIST database (Modified National Institute of Standards and Technology database) is a huge handwritten digit database usually used to prepare different image handling frameworks. In addition, the database is widely used for preparation and testing in the AI field.
It was created by ‘re-blending’ examples from separate NIST datasets. The makers felt that since NIST ‘s preparation dataset was taken from American Census Bureau representatives, while the testing dataset was taken from American high school understudies, it was not suitable for AI tests. In fact, NIST’s high-contrast pictures were streamlined to fit into a 28×28 pixel jumping box that was hostile to associated rates.
The MNIST database has 70,000 images out of which 60,000 prepare pictures and 10,000 check pictures.
Examples of Image set
B. JUPYTER NOTEBOOK
It is an open-source program that enables the writing and sharing of work code, calculations, visualizations, and narrative text files. Some of its applications are: data cleaning and transformation, numerical simulation, mathematical modeling, machine learning, etc.
Jupyter Notebook Setup for Web application
C. ANACONDA PYTHON
Anaconda is a free and open-source conveyance of the Python and R programming dialects for logical registration (information science, AI applications, huge-scale preparation of information, prescient investigation, etc.), which aims to disentangle the board and arrangement. The bundle executive system conda oversees package-adaptations. Over 15 million clients use the Anaconda distribution, containing more than 1500 prevalent information science packages appropriate for Windows, Linux , and Mac OS.
[image: ]
Figure 13 Example of Anaconda Python using for framework
D. NUMPY LIBRARY
NumPy is a Python programming language library, with support for large, multi-dimensional clusters and matrices, along with a wide array of high-level numerical capabilities to operate on these exhibits. NumPy’s progenitor, Numeric, was initially made by Jim Hugunin with commitments from various engineers. Travis Oliphant made NumPy in 2005 by consolidating Numarray’s contending highlights into Numeric with broad adjustments. NumPy is open-source programming with numerous patterns.
Numpy is basically used for Python Programming languages
E. ANN
Artificial neural networks are empirical learning calculations reused by natural neural network properties. They are used for a wide variety of errands, from fairly simple arrangement to discourse acknowledgment and PC vision. ANNs are loosely based on organic neural networks, it could be said that they are actualized as an arrangement of interconnected handling elements, now and again called hubs, basically comparable to natural neurons. The connections between different hubs have numerical qualities, called loads, and in orderly changing these qualities, the network is finally ready to rough the ideal power.
Every hub in the network takes various inputs from different hubs and calculates a solitary yield based on information sources and loads from the association. This yield is mostly bolstered into another neuron, rehashing the procedure. When equipped with the data given in the last sentence, one can imagine without much of a stretch the different leveled inner structure of the artificial neural system, where neurons are sorted out into different layers as delineated below. The information layer gets the information sources and yield layer creates yield. The layers lye in the middle are called shrouded layers.
Examples of Input, Hidden and Output Layers with Nodes
F. ARTIFICIAL INTELLIGENCE
Artificial Intelligence, often referred to as Machine Intelligence, is best defined as human intelligence by computers, but in general machines. The machines determine as outputs. Such decisions are based on a set of specified law, and machines make approximately correct or precise judgments. Such guidelines are also used for decision-making corrections.
FLOWCHART DIAGRAM
First, as you can see in the above flowchart, we start the program and observe if the picture is manual or MNSIT. If the image is manual then we put an image segmenter and scan the file. And if the image is MNSIT, we scan the image directly. After scanning the picture, if we want to train the model, a retrained model is obtained, we go through the train model stage and if not directly jump on the ANN method, which then takes us to output p and finalizes the process by saving the report and then go to the CSV file and stop the program. And if we do not want to save the report, we can stop it straight.
CONCLUSIONS
Optical character recognition is an interesting area for machine learning researchers. The goal is as accurate as possible. Use the ANN to solve the similar problem of digit recognition.
The designed ANN has 3 layers of the input layer with 784 neurons fully connected to a hidden layer of 75 neurons, which is then fully connected to the output layer with 10 neurons to decide if the input image is between 0 and 9
Examples of ANN Forward Pass
That is called ANN’s forward pass. The ANN will generate some output for the training set compared to the actual output and calculate the difference using mean squared error.
Examples of ANN Backward Propagation
The error is propagated as shown in Fig. 18 Through ANN, weights can be optimized.
For summarize, the ANN is trained for 350 epochs on a batch size of 20 with 60,000 training examples and 10,000 test examples on the MNIST dataset. Experiment after various hyper parameters. The best apprenticeship value is 0.002 and the number of hidden neurons is 75. More epochs can not be taken as the accuracy has reached the plateau values , i.e. no more increase in model performance even if the model is retrained.
The model’s first epoch accuracy is 33.14%, which continued to increase with the number of epochs until the number of epochs where it reached an accuracy of 88% after the accuracy began to increase very slowly and after 350 epochs the model reached an accuracy of 92%. Once the model is checked on the training set, it demonstrates 95% accuracy.
Examples of MNIST Simulation result
The MNIST website also offers the best model for their dataset, indicating that using a 2-layer neural network with 800 hidden neurons and cross-entropy will lead to 93% accuracy.