With the growing abundance of resource materials on the internet, the need for information retrieval calls for automatic captioning of image to extract relevant information for a particular query of a user. Without any doubt, the task of manually summarizing such images will be herculean, and this calls for automation in this field to reduce the time and effort. For instance, this can be used to provide results for a query by a search engine and in the case of image classification, comparison etc.
Valiance has created a platform using Python and Microsoft Cognitive Toolkit CNTK (deep learning framework by Microsoft Research) to summarize the images. The approach uses dataset of images and their sentence description to learn the captioning of image data. The model is based on combination of Convolutional Neural Networks over image regions and Recurrent Neural Networks over sentences.
Steps involved are:
- Preparation of training data (that is images with summary), through unsupervised learning methods.
- Image Processing of the data in OPENCV.
- Model training & testing using Python and CNTK.
- Substantial saving in time and cost from manual captioning of each image.
- Allows search of images based on associated metadata such as keywords, text, etc.