Can we start getting a similar flood of tools to *generate* the embeddings now? ...

ukuina · on Aug 21, 2023

There are dozens of different models that can generate embeddings: https://docs.marqo.ai/1.0.0/Models-Reference/dense_retrieval...

Most frameworks, like Haystack, can wrap embeddings generation for you.

kordlessagain · on Aug 21, 2023

I work at FeatureBase and I'm storing vectors from the Instructor Large library/model into our solution. Getting good results, which I should probably quantify at some point. One thing that FeatureBase does well is allow filtering of the vector space via SQL.

I would say that most people seem to prefer an engine that embeds and stores things as a service, but using Instructor is only a few lines of code and runs locally.

tomhamer · on Aug 21, 2023

Just to quickly add to ukuina's comment, marqo.ai does embedding generation and vector search end to end, so you can put in documents and the embeddings are automatically generated.

dmezzetti · on Aug 21, 2023

Lot of tools that can do this and they've long been around. For example, txtai has been able to generate embeddings with sentence-transformers since 2020.

minimaxir · on Aug 21, 2023

SentenceTransformers all-MiniLM-L6-v2 is still your best bet since you can generate them in batches with GPU acceleration.