GraphoMatch: Forensic handwriting analysis using machine learning

Our GraphoMatch project aims to revolutionise forensic handwriting analysis with Convolutional Neural Networks (CNN) and machine learning. Making it simpler and more trustworthy to determine who authored a writing piece or if a signature is authentic or not. We can examine handwriting samples closely using CNNs, which helps us get beyond some arbitrary guessing often used in this industry. Within the world of forensics machine learning, pattern recognition is part of a larger study field. This field has been growing with the help of a newer age framework and machine learning technologies. An average human writing is very predictable with more than 90% of differences that can be predicted using machine learning. Our project aims to improve that difference with more data and image training to make our model near-perfect for classification.


Introduction
Professionals in the field of forensic handwriting analysis carefully look at every part of the writing to figure out who wrote it and make sure it is real.The way each person writes is unique, just like each person's fingerprint is unique [1].When looking for fakes, analysts focus on finding differences rather than similarities between well-known and lessknown samples [2].They pay close attention to small details like spacing, pressure, curves, and slants as they look at how letters are formed, how lines look, and how they are arranged [3].But problems come up when people try to hide their original writing, like when they are simulated.Simulation is shown by lines that don't follow a straight line, darker strokes at the beginning and end of words, and unnecessary pen lifts.To make sure that the results of handwriting analysis are reliable, all of these different factors must be taken into account.
There are several methods from which we can identify how well and reliably standard forensic handwriting analysis works [4].One big worry is that graphology is subjective by nature, which can make the results biassed and weaken the process because there are not so many widely accepted standards.This can make it harder to figure out who wrote something.Analysts depend on how similar or different groups are to each other [3].This process makes sure if there are any clear "points" or traits that need to match up to prove a theory, it can make the analysis process less clear [5].On top of that, results that depend on subjective interpretation and the analyst's own knowledge may not be consistent.This shows how important it is to have clear standard operating procedures and rules.These steps are needed to make sure that the evaluations are fair and consistent.This builds the trustworthiness and dependability of handwriting analysis in forensic investigations [6].
The problems that come up with forensic handwriting analysis show how important it is to keep researching and making progress in this area to get around the problems that are already there and make the method more reliable as an investigation tool [6].We need more advanced methods to solve the problems caused by the subjective nature of current graphology methods and the lack of clear matching points [7].It is possible to make handwriting studies much more accurate and objective by setting stricter standards, using measurable measures, and incorporating new technologies [3].As the field develops, these improvements are very important for making forensic handwriting analysis more reliable [4].They also make sure that it stays an important part of forensic science, helping with the careful reading and understanding of evidence scripts.
Because traditional methods of forensic handwriting analysis are subjective, don't use standardised criteria, and rely on the knowledge of the person doing the analysis, we need to come up with better and more accurate methods [8].These problems can make forensic studies less reliable and accurate.This shows how important it is to use new technologies, like machine learning, to make the investigation process better [9].Researchers want to improve the speed, accuracy, and dependability of authorship verification and forgery identification by adding machine learning methods to forensic handwriting analysis.These kinds of improvements are likely to lead to more accurate and useful results in forensic investigations.This will change the field of handwriting analysis by using a data-driven and objective method.
The combination of machine learning technologies with forensic investigation methods has created a new and exciting area of forensics called "machine learning forensics."This cutting-edge method uses computers to carefully sort through digital data, finding trends and oddities that regular analysis might miss.Machine learning makes it possible for forensic experts to search through and understand huge amounts of data with incredible speed and accuracy [10].It does this by handling complicated tasks like finding outliers and recognising patterns.This new technology marks the start of a new era in forensic investigations, marked by faster methods and deeper insights that allow investigators to better understand the complicated patterns and dynamics that make up illegal behaviour.The project's goal is to create GraphoMatch, a cutting-edge tool that will change the way forensic handwriting analysis is done by using machine learning.Aiming to improve the objectivity, speed, and accuracy of forensic handwriting research, this tool is designed to make the most of machine learning methods by comparing and analysing handwriting samples [7].Convolutional Neural Networks (CNN) are at the heart of GraphoMatch.CNN is a powerful machinelearning tool known for its ability to pull out features from pictures.CNNs make it much easier to find and analyse handwritten material by turning samples of handwriting into text characters that can be analysed [7].This progress is made possible by using a large collection of handwritten texts along with advanced picture-processing techniques.Graphology and forensic investigation are entering a new age of technological intelligence and analytical precision.The main goal of this project is to make a strong and reliable tool that can carefully recognise and evaluate handwriting patterns.

Literature Review
In the present period, the advancement in Machine learning is blooming which helps in various factors.Handwriting recognition can have various ways to be recognized.A proper study for the use of tools to use it is very much necessary.
Hao Zeng and colleagues suggest a technique that emphasises the use of a straightforward neural network instead of complex models that demand powerful computer setups to effectively identify handwritten digits with reasonably good accuracy.[11] Sampath and Gomathi [12] developed an optical character recognition system utilising a hybrid neural network algorithm that merges the Firefly and Levenberg-Marquardt methods.This novel approach leverages the advantages of both algorithms, hence improving the speed and precision of the system.The FLM approach with feed-forward neural networks exhibits more efficiency as compared to an SVM-based technique, specifically in the context of gradient feature descriptors.Nevertheless, a significant disadvantage of their system is its sophisticated architecture, which might be excessively complex for carrying out simple tasks.
Technology is providing a significant boost to forensic handwriting analysis [4].High-resolution digital cameras are capable of capturing minute nuances in handwriting that may go unnoticed by examiners.Specialised software enables them to analyse this information using measurements, comparisons, and visualisations.It still requires a certified examiner's specific training and experience, in spite of the great advances in technology.Their special ability is reflected with profound knowledge in the field of analysis to understand small subtleties and make reasoned judgments.
In an attempt to step out of these boundaries, researchers are now turning to modern technologies, like machine learning.The current system is giving the capability to look at patterns seen in a variety of handwriting samples.Such computers are fully able to deal with tasks starting from the identification of important features, signature verification, authorship of papers, right up to counterfeit items.
The behaviour of matching handwriting is the understanding of the features of shape distortion by giving appropriate examples of handwriting.In this method, a sample of handwriting is represented by the regular extraction of a set of points from its skeleton.In this part, the method of handwriting matching is applied [13].
Machine learning is set to revolutionise forensic investigations by making the way evidence is evaluated and decisions made easier.The techniques of machine learning have contributed highly toward improvement and have served in modernising a set of forensic disciplines through the data-driven methodology, which increased preciseness, effectiveness, and dependability of investigation [4].Handwriting analysis from identifying different types of writing and proofs of signatures to detection of forgeries derives enormous benefit from machine learning [14].But, at the same time, it is necessary to understand the limitations of this powerful tool.To this day, no technology can act as a substitute or equivalent for the qualified experience of a professional examiner in the field.Even better would be when these personal traits of a professional are in sync with technology capabilities of the highest order.
Research in this area has examined many machine-learning techniques, including:  Support vector machines (SVMs): SVMs are one of the algorithms used in forensic handwriting analysis for purposes of data categorization.They have much importance as they are reliable in differentiating between various handwritten materials and distinguishing authentic from forged signatures. Neural networks: Hierarchical representations of the data are learned by neural networks, specifically recurrent neural networks (RNNs) and convolutional neural networks (CNNs), from handwriting.The latter has more specifically become instrumental in the handling of tough tasks of pattern recognition, given that they allowed a more accurate and stronger analysis of samples of handwriting [15]. Hidden Markov models (HMMs): HMMs are used to analyse how the pen moves and the order of strokes in handwriting, which naturally follows a sequence.This type of modelling is key for understanding handwriting's inherent patterns, leading to advanced uses such as identifying the writer or recognizing the handwriting itself.
In forensic handwriting analysis, machine learning methods offer several advantages over traditional techniques [4]:  Automation: Machine learning algorithms simplify the analytical process, automate repetitive tasks, and decrease the need for human inspection, each of which result in handwriting analysis being more efficient. Objectivity: By utilising statistical analysis and learnt patterns, machine learning models provide objective assessments of handwriting characteristics while reducing the impact of subjective biases that are frequently inherent in human judgement. Scalability: Machine learning-enhanced automated systems can process a lot of handwriting samples in an effective manner.With this talent, forensic examiners may handle complicated cases more skilfully, increasing their total productivity. Consistency: Machine learning uses uniform methods and criteria to encourage consistency in analysis across different cases and examiners.This method improves the forensic handwriting analysis results' repeatability and dependability.

Limitation of current work
The current forensic analysis of handwriting has to get past a number of problems to make sure that the results are true and correct.One big problem with handwriting analysis is that it is subjective.Different experts may see the same parts of handwriting differently, which can lead to different results [16].The research might not be as accurate as it could be because forensic document experts (FDEs) are biassed, have different levels of experience, and use different methods.
When someone tries to hide themselves or copy someone else's handwriting on purpose, it can be harder to analyse and can lead to wrong conclusions about the handwriting's features [16].Stress, nonstandard writing tools, or different writing settings can also change the quality of handwriting samples.This could affect how accurate FDEs are when they compare and draw conclusions.
In traditional forensic handwriting analysis, the writing that is being looked at is compared to known standards or exemplars.Forensic document examiners (FDEs) carefully look at things like letter shape, line quality, spacing, slant, and pen pressure to figure out who probably wrote something.There are, however, some problems with this method:  Subjectivity and Lack of Standardisation: Evaluations are frequently subjective and depend on the knowledge and judgement of the tester.This could cause differences between examiners and regions. Time Limits and Bias: Doing a study by hand can take a lot of time and work, especially for complicated situations.Examiners may also be biassed because of their personal views or the circumstances they are in. Inconclusive Results: Sometimes the results may not be clear or conclusive, especially if the writing being questioned is well hidden or there isn't enough handwriting to compare it to.
Standard working procedures and rules are needed to make sure that handwriting analysis in legal situations is based on science and can be trusted [17].Also, handwriting analysis might not be allowed as proof in some courts, based on the jurisdiction and the judge's choice.To get around these problems, forensic handwriting analysis must be made more useful and legitimate in crime cases.This could be done with the help of machine learning methods, making techniques more consistent, and studying all the time.

Problem Definition
An important area of study, forensic handwriting analysis seeks to establish the legitimacy and originality of handwritten documents [18].Although this field is important, traditional methods have drawbacks such as high learning curves, high levels of subjectivity, and the possibility of human error.This research presents a novel approach by creating an automated system to analyse handwritten writings using Convolutional Neural Networks (CNN) and other machine learning technologies.
Pen strokes, word and character spacing, word and character size, page margins, and a plethora of other handwriting traits may all be easily extracted and analysed by this sophisticated technology.It examines these characteristics in great depth in comparison to handwriting samples from both recognized and unknown authors.Finding out how likely it is that the samples were authored by the same individual is the goal.
The goal of the innovation is to make forensic handwriting analysis more efficient and accurate when determining authorship.The technology equips handwriting specialists with a powerful tool that lessens their burden and increases the accuracy of discovering possible matches by automating the process of feature extraction and comparison.This research highlights the multidisciplinary nature of modern investigative approaches and enhances the field of forensic handwriting analysis.It also opens new possibilities for using CNNs and machine learning in forensic science.
More objective and effective methods for forensic handwriting analysis are required because of the shortcomings of the work that has already been done in this area, including subjectivity, a lack of standard criteria, and a dependence on individual skills [19].This research attempts to solve these constraints and improve the accuracy and reliability of authorship identification in forensic investigations by building a system that uses machine learning to extract and evaluate handwriting traits.In addition to streamlining the analysis procedure, the suggested methodology would give forensic specialists a more methodical and data-driven way to compare and assess handwriting samples, hence raising the general efficacy of forensic handwriting analysis.
Technological developments in machine learning and its use in forensic investigations have demonstrated encouraging outcomes in terms of improving and automating a range of analytical procedures [20].In addition to adding to the body of information already available in forensic handwriting analysis, this project attempts to improve authorship identification by incorporating Machine Learning techniques-more particularly, CNN-into the examination of handwriting traits.In addition to being helpful to forensic specialists, the creation of a system that can use extracted features to predict the likelihood that two writings are by the same person will have wider ramifications for the field of forensic science and provide a more objective and data-driven method of handwriting analysis in criminal investigations.

Data Collection and Pre-processing:
In this study, we construct, validate, and analyze a Convolutional Neural Network (CNN) model using a Kaggle dataset that provides handwritten words together with their classifications [21].This dataset is a meticulous collection of 330,000 training pictures of handwritten text in JPEG format.The validation dataset comprises 41,000 photographs, and the testing dataset has 40,000 images, so there's a good foundation for analyzing the model.There are various types of handwriting, both readable and unreadable.Also, the pixel values are normalized so they fall on a scale from 0 to 1.In order to facilitate the model's training convergence, this normalisation ensures that the input data scales remain constant.Training and assessing the CNN model's performance is made possible by the meticulous preprocessing procedures.This, in turn, enables meaningful assessments and has the potential to increase handwriting recognition.However, a few types of writing are unreadable for the model which is shown below.

Figure 2 Unreadable handwriting from the Dataset
Using data augmentation approaches, we strengthen the model so it can better generalize and handle new, unknown data.These methods allow the model to learn from a wider variety of handwriting styles and shapes by adding modifications to the training data, such as rotations, shifts, and zooms.This is a crucial step in making the model more accurate and reliable for forensic handwriting analysis in the real world.
The second most important step is label preprocessing.Using one-hot encoding, we convert textual labels (word names) to numerical representations.A multi-class vector representation with N dimensions is produced by assigning distinct numerical identifiers to each word class.

Model Architecture and Training:
The development of our model involves a structured sequence of layers designed to effectively process and analyse the input data, which, in this context, are grayscale images of handwritten words.

Input Layer (`input_data`)
This layer serves as the entry point for our images.It accepts images with dimensions of `(256, 64, 1)`, where `256x64` represents the height and width of the image, and `1` signifies that the images are in grayscale.

Conv1
 This first convolutional layer utilises 32 filters, each with a `(3, 3)` kernel size, capturing various aspects of the image. Batch normalisation follows to enhance the stability of the training process. The ReLU activation function adds the necessary non-linearity to the model, aiding in learning complex patterns. Max-pooling with a `(2, 2)` pool size reduces the dimensions, focusing on the most prominent features.

Conv2
 The second convolutional layer increases the filters to 64, maintaining a `(3, 3)` kernel size for detailed feature extraction. It also includes batch normalisation and ReLU activation. Max-pooling is applied again, along with a dropout layer set at 30% to mitigate overfitting by randomly ignoring some neurons during training.

Conv3
 With 128 filters, this layer continues the pattern of feature extraction, batch normalisation, and non-linearity with ReLU. A unique vertical max-pooling is applied here with a pool size of ` (1, 2) `.
 Another dropout layer at 30% is included to further combat overfitting.

Transition to RNN
 The convolution layer's output is transformed into a 2D tensor to be compatible with the RNN layers. A fully connected layer with 64 units and ReLU activation transitions the data, preparing it for sequential processing.

Recurrent Neural Network (RNN)
 The model incorporates bidirectional LSTM layers, enhancing the model's ability to understand context by processing the data in both forward and reverse directions. Two LSTM layers are stacked, each returning sequences to ensure continuous flow and integration of information.

Output Layer
 The concluding layer is a dense layer with units equal to the number of characters in the dataset, facilitating character-level predictions. A softmax activation is applied to this layer to output probabilities for each character class, forming the basis for prediction.
This architectural design is crafted to meticulously process the input data through various stages, extracting and refining features, and ultimately generating predictions with a focus on enhancing the accuracy and reliability of the handwriting analysis.

Feature Extraction
A key component of the operational system for evaluating and contrasting handwriting samples is feature extraction.
The system analyses two input photos to extract different properties from each using the trained Convolutional Neural Network (CNN) model.A wide range of handwriting traits, including spatial connections, character forms, and stroke patterns, are included by these properties.
To improve feature identification, the first step is to convert the input photos to a binary format, which increases the ink-to-background contrast.A further resizing process ensures that the pictures are 256 pixels by 64 pixels, the input dimensions anticipated by the CNN model.The input is prepared for processing by the model when the pixel values are normalised to a range of 0 to 1.

Figure 4 Feature Extraction from input images using Trained model
We feed the pre-processed photos into the trained convolutional neural network model.By applying its learned weights and filters to the input photos, the model is able to extract useful characteristics from a large collection of labelled handwriting samples.This step culminates in a high-dimensional feature vector that captures the spirit of the handwriting style in each picture.

Similarity Analysis
Conducting a quantitative comparison of the feature vectors obtained from the two handwriting samples is the primary goal of the similarity analysis phase.The degree to which the two sets of characteristics resemble each other is reflected in the similarity score that is generated by this comparison.Here, we measure how similar two feature vectors are by finding their geometric distance.A simple and efficient way to measure the closeness of feature vectors in feature space is by their Euclidean distance; a lower distance implies higher similarity.

Evaluation and Interpretation
A quantitative foundation for determining the chance that the two handwriting samples come from the same person is provided by the similarity score generated from the study.This assessment aids professionals in forensic handwriting analysis in making educated conclusions about authorship verification and forgery identification.
Thresholds that are unique to a certain situation are necessary for interpreting the similarity score; these thresholds may be defined by combining factual evidence with forensic experts' opinions.There may be evidence of shared authorship if the similarity score is low and the handwriting samples are very similar to one another.On the other side, a high score might indicate noteworthy variations in the handwriting traits, which could point to the presence of many writers.
The system's accuracy and dependability can only be confirmed by extensive testing with a wide variety of handwriting samples before it can be used successfully in forensic contexts.By comparing the results to both human experts' opinions and the gold standard for forensic handwriting analysis, this validation procedure guarantees that the system is functioning as intended.

Results
The "GraphoMatch" project has shown encouraging outcomes in both the validation phase and practical use for writer identification.It integrates Convolutional Neural Networks (CNN) with modern frameworks for forensic handwriting analysis.Here we explore the validation performance indicators and the model's ability to differentiate between authorspecific and non-author-specific handwriting samples.
 Validation: The model's remarkable predictive power for both individual letters and whole words in handwriting samples was on full display during validation.

Figure 7 Code Snippet for model's output
In terms of character predictions, the model was 82.16% accurate, and for word predictions, it was 69.10% accurate.
These results highlight the model's skill in forensic handwriting analysis-the extraction and interpretation of subtle information from handwriting samples.
As part of the validation procedure, a dataset was used to train the model to recognize and understand handwritten text.The model was then requested to convert these visual inputs into digital outputs that could be evaluated quantitatively.A strong basis for future authorship investigations is the model's excellent character-level accuracy, which shows how well it can distinguish between different types of handwriting.
 Writer Identification Analysis: We ran a series of tests to see how well GraphoMatch performed its primary job, which is to determine whether two handwriting samples are from the same writer.
The program estimated that there was an 80% chance that samples from the same writer were indeed from the same person after examining them.On the other side, the algorithm had a 70% chance of correctly predicting that samples from various authors were not from the same person after evaluating them.
In forensic circumstances, the capacity to differentiate between authors is crucial, since it might provide crucial evidence.This area of performance demonstrates that the model has a strong grasp of the handwritten feature space, which enables it to provide accurate authorship predictions using the extracted features and their geometric correlations in the feature vector space.
 Similarity Analysis Interpretation: An integral part of GraphoMatch, the similarity analysis uses a distance measure based on the equation of geometrical proximity to measure how similar two feature vectors obtained from handwriting samples are to one another.A simple and efficient way to measure the closeness of feature vectors is to use their Euclidean distance; a smaller distance indicates a higher degree of similarity and, thus, a higher probability of shared authorship.
A more objective and data-driven approach to an area: usually dominated by subjective expert analysis, is offered by the findings of this research, which give a mathematical basis for the model's predictions about authorship.Probabilistic evaluations that forensic professionals may use are a result of the model's sophisticated grasp of the handwriting feature space, as seen by the difference in probability values between samples from the same writer and those from different authors.

Discussion
Forensic handwriting analysis will make significant progress forward thanks to the findings of our project.The method provides a fresh perspective by utilizing convolutional neural networks (CNNs) and machine learning.This approach merges superior writer-identifying capabilities with exceptionally accurate character recognition.
Forensic analysis could benefit from the model's ability to differentiate between same-writer and different-writer samples, which gives a quantitative foundation to conclusions that were previously based on subjective expert opinion.However, the probabilities indicate room for improvement, and future work will focus on refining the ability to distinguish between works written by various writers.

Conclusion
Our project aims to create a fusion of machine learning technologies with traditional forensic investigation methods and has given rise to an exciting field known as "machine learning forensics."This cutting-edge approach harnesses the computational power of computers to meticulously sift through digital data, uncovering trends and anomalies that might elude conventional analysis.By efficiently handling complex tasks such as outlier detection and pattern recognition through CNN and RNN, we aim to empower forensic experts to navigate vast amounts of information swiftly and with remarkable precision.As a result, our project heralds a new era in forensic investigations, providing faster methodologies and deeper insights that enhance our understanding of the intricate patterns and dynamics underlying differences within the writing pattern.

Figure 1
Figure 1 Readable handwriting from the Dataset Careful planning and the inclusion of certain critical phases in the data preparation trajectory guarantee optimal performance of the CNN model.First, we convert the image to grayscale.Then, we convert it to binary format.By contrasting the text with the backdrop, this change enhances the model's feature-separation capabilities.Convolutional neural networks (CNNs) rely on consistent data, thus we standardize all pictures to 256 x 64 pixels.

Figure 5
Figure 5 Sequence diagram for calculating the similarity score

Figure 6
Figure 6 Activity diagram for evaluation of Model