Build your own Search Engine using OpenAI’s CLIP and FastAPI

Okay, so this search engine will not be a full-fledged one but you’ll get a hand wtih all the capabilities of the OpenAI’s CLIP model and how to use it to your own use cases.

What is CLIP ?

CLIP architecture

Suppose you want to tag your images to a set of particular words, what you’ll do?

You’ll probably build a classifier that can do this pretty easily right? But what if you want to tag them to a sequence of words or even a sequence of sentences. It is hard to utilize a classifier for this purpose.

Now here CLIP comes into play, it is trained on a pair of a sequence of words and an image. So, it knows whatever it is seeing can be formulated into a sequence of words. It won’t generate those words rather it can tell that this particular image is closely related to these particular set of words.

So you provide your images to the model and a set of sentences to it and it will tell that this image is closely related to these sentences with this similarity.

If you want a full-blown read of CLIP’s working head over to this official article.

The official code implementation of can be found here.

How to use CLIP to build your Search Engine?

The code for this article can be found here.

We’ll do two things here, first

Image Search

  1. We’ll go through the images in a particular directory and generate embeddings for them and store it in a DB.
  2. Then for a text query, we’ll generate new embeddings and then calculate the similarity between the embeddings stored in DB with the query.
  3. Using this similarity we can know, with which image the query resembles more.
  4. Now we can sort the results and display them.

Here you can see we are creating embeddings of all the images and storing in DB.

Once it is done, you can calculate text and image similarity using this

Now you can use this similarity metric to sort the images and display them.

The second thing we’ll build is

Reverse Image Search

For this the idea is the same.

  1. We already have image embeddings stored, we’ll create new ones for the query image.
  2. Then we can calculate similarity, sort and display them.

FastAPI as backend

I wanted to use Flask but we have FastAPI widely getting popular so why not use it then.

FastAPI is a high-performance asynchronous web framework. It is around 10X faster than Flask. So it makes sense to use it for super-fast APIs.

You can refer to the official docs here for more details.


We have used the following libraries in the code:

  1. FastAPI
  2. Sqlite3(I know it’s not ideal but just for experimentation stuff)
  3. CLIP model based on PyTorch framework.

Expanding the Engine

To expand the engine you’ll have to index more and more images into the DB and generate embeddings. As theDB will grow the similarity calculation will be slow. So, for that a full fledged database could be useful. I’ll cover more of these things in my next post.

I’ll be adding more features to this project soon and release a new article soon. Please feel free to PR to the Github repo and contribute to the project.

The code repo is available here.

Please click on the 👏 button if you liked the post and hold it for giving more love.

Connect me here:

Github Twitter Instagram LinkedIn

Machine/Deep Learning, Amateur Photographer