Hey there, so in this blog post I am going to describe how face recognition works, and also give a possible application of this technology. I’ll try my best to express what I know and also where it can be applied. If you have ideas or questions, feel free to leave a comment. Alright lets get cooking 👨🍳
First of all, let us consider what face recognition is. Basically a face recognition system is capable of identifying a face when a picture or video is provided. It is one of the applications of artificial intelligence (AI) that has slowly crept into our daily activities. For example consider the following picture below:
So if you use an iPhone the picture above should be very familiar. Its interesting to note that such sophisticated technology was applied to the camera feed. So considering that the face is correctly detected, this can allow the iPhone to focus on those faces and thereby enhance the picture in general. Also newer versions of the iOS classifies pictures taken according to the faces recognised and then groups them by this classification.
Facebook also offers a way of tagging your pictures ( those taken with your friends) automatically, ring any bells? yup face recognition 😎.
Also the Face recognition (popularly referred to as ‘Face ID’) by apple has been met by much fanfare.
So these are very good examples of how an application of AI (face recognition) is already forming part of our daily routine.
The Python API basically makes calls to dlib and then gives an output. So I would recommend downloading and installing the python library. However, if you can’t, that’s alright. I will still explain how the system works in this post anyways.
Great!, now most of the technical stuff is out of the way, lets get back to regular english. Now let’s use the Face ID as a case study. Before we can say that the face recognition system works, it must detect a face, and recognize this face as the owner. This establishes that the first step is detecting a face (how it is done will be addressed shortly). Concerning the issue of recognizing and authenticating the face, its salient to note that the system is not expected to pass 100% of the time. I mean its a system after all and we give allowance to some errors (the fewer the better) that may arise. This library has a 99.38% accuracy according to the Labeled Faces in the Wild benchmark. Yeah its pretty powerful, and yeah lets cut it some slack, even we humans don’t always recognise people 😅.
Back to the subject at hand, the process of Face recogniton can be broken down into four stages.
Remember we said that detecting the face is very important? well that’s very true. So basically we humans can look at pictures, spot features we associate with human faces(face shapes, nose shape, eye shape, etc…) and then we recognise its a face. This process takes place in split seconds and we don’t even recognise the beauty of our Nervous system 😍. Now the computer doesn’t recognise information the same way we do. So we have to find an efficient and effective way to help it recognise. The Histogram of Oriented Gradients is one of the most effective methods of doing this.
The first procedure when using this method is to turn the picture to black and white. The color of the picture is not required when finding the face. In this model we basically consider every pixel in the converted image and then compare it with every pixel in surrounding it. Then we draw and arrow in the direction which the image is getting darker. Now repeating this for every pixel in the image will give us gradients. However, limiting these comparisons to 16×16 matrix sample space of pixels helps us reduce the number of arrows in the final picture. Then we compare the resulting pattern against other patterns already used in training the HOG model. Tada, we have our image.
This is an example of HOG model:
Alright thats out of the way.
So now we’ve detected a face. Next up is deriving its pose and then projecting. Now basically as humans it follows logically too. In our minds, after we have detected a face we follow to determine if we have seen that person before. Our comparisons follow based on the ‘pose’ we have in our minds.
Fortunately, we have a method of detecting “face landmarks” for our computers. This method is the 68-point method invented by Vahid Kazemi and Josephine Sullivan . So its basically detecting different points around the eyes, the lips and then the facial features. The picture below depicts this :
Now when we have this detection, we can confidently say we have represented this image. If this representation is gotten successfully from a slant image we can ‘project’ it straight. This ensures that we can have similar representation whether a face is sideways or not we can represent it.
So consider the following picture
The following are the face landmarks detected:
Alrighty step 2 ✅
Now we have successfully detected the face landmarks in the image. The next thing is to compare these images with other images. Now this means that we must find an effective way of storing these images so that we can iterate through these pictures when comparing for similarities. We might not care about efficiency with small pictures but consider big systems like Facebook and Instagram that have millions of pictures, how would they navigate ?
This means that the issue of encoding is important. Also, we are fortunate to have a 128 representation model for images. This encoding method makes it easy for encoding and training. Consider the following as an example of this encoding:
This method of encoding provides unique representations for each picture. The uniqueness helps us differentiate between pictures. Pictures who have very similar number representations , most likely belong to the same person.
The final Step. Now we have encoding of various pictures we can classify them and use the result of this classification to predict if the person in the picture has been enrolled or not.
These are roughly the processes involved in a face recognition process (using the python library as a case study). Hurray 😁
There are amazing applications of face recognition. Yes its true its not perfect, but with the amount of research going on, it should get better.
From solving seemingly ‘trivial’ problems like taking attendance in classrooms; to been used as part of the much bigger systems (like those involved in self driving cars).
I hope you had fun reading this article, can’t wait to hear feedback
If you want to know what Adam Geitgey has to say about his python api, this could help .
Acknowledgment for images from :
Fig 1* – This image was created by Brandon Amos
Fig 2*- This image was gotten from https://medium.com/@ageitgey/machine-learning-is-fun-part-4-modern-face-recognition-with-deep-learning-c3cffc121d78
Fig 1- https://static.makeuseof.com/wp-content/uploads/2015/05/Face-detection1.png