Build your own Search Engine using OpenAI’s CLIP and FastAPI

Photo by Roman Synkevych on Unsplash

Okay, so this search engine will not be a full-fledged one but you’ll get a hand wtih all the capabilities of the OpenAI’s CLIP model and how to use it to your own use cases.

What is CLIP ?

CLIP architecture

Suppose you want to tag your images to a set of particular words, what you’ll do?

You’ll probably build a classifier that can do this pretty easily right? But what if you want to tag them to a sequence of words or even a sequence of sentences. It is hard to utilize a classifier for this purpose.

Now here CLIP comes into play, it is trained on a pair of a sequence of words and an image. So, it knows whatever it is seeing can be formulated into a sequence of words. It won’t generate those words rather it can tell that this particular image is closely related to these particular set of words.

So you provide your images to the model and a set of sentences to it and it will tell that this image is closely related to these sentences with this similarity.

If you want a full-blown read of CLIP’s working head over to this official article.

The official code implementation of can be found here.

How to use CLIP to build your Search Engine?

The code for this article can be found here.

We’ll do two things here, first

Image Search

  1. We’ll go through the images in a particular directory and generate embeddings for them and store it in a DB.
  2. Then for a text query, we’ll generate new embeddings and then calculate the similarity between the embeddings stored in DB with the query.
  3. Using this similarity we can know, with which image the query resembles more.
  4. Now we can sort the results and display them.

Here you can see we are creating embeddings of all the images and storing in DB.

Once it is done, you can calculate text and image similarity using this

Now you can use this similarity metric to sort the images and display them.

The second thing we’ll build is

Reverse Image Search

For this the idea is the same.

  1. We already have image embeddings stored, we’ll create new ones for the query image.
  2. Then we can calculate similarity, sort and display them.

FastAPI as backend

I wanted to use Flask but we have FastAPI widely getting popular so why not use it then.

FastAPI is a high-performance asynchronous web framework. It is around 10X faster than Flask. So it makes sense to use it for super-fast APIs.

You can refer to the official docs here for more details.

Libraries

We have used the following libraries in the code:

  1. FastAPI
  2. Sqlite3(I know it’s not ideal but just for experimentation stuff)
  3. CLIP model based on PyTorch framework.

Expanding the Engine

To expand the engine you’ll have to index more and more images into the DB and generate embeddings. As theDB will grow the similarity calculation will be slow. So, for that a full fledged database could be useful. I’ll cover more of these things in my next post.

I’ll be adding more features to this project soon and release a new article soon. Please feel free to PR to the Github repo and contribute to the project.

The code repo is available here.

Please click on the 👏 button if you liked the post and hold it for giving more love.

Connect me here:

Github Twitter Instagram LinkedIn

--

--

--

Machine/Deep Learning, Amateur Photographer

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Building a Logistic regression Using Neural Networks: Cat vs Non-Cat Image Classification

Face Detection in Flutter Using Firebase’s ML Kit

Challenges in Developing Multilingual Language Models in Natural Language Processing (NLP)

Classification on Organic Compounds

What makes Neural Networks learn?

Understanding decision trees (CART)

Using categorical data in machine learning with python: from dummy variables to Deep category…

Content Bases Image Retrieval(CBIR)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adesh Gautam

Adesh Gautam

Machine/Deep Learning, Amateur Photographer

More from Medium

Build a WhatsApp AI-Powered Chatbot with GPT-3, using Mantium & Twilio API

fastText: its Model Architecture and Applications

Implementing an Automated Resume Screener

How we scaled our invoice recognition to process over 5 million pages every month.