Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can we start getting a similar flood of tools to generate the embeddings now? That’s my bottleneck. Searching them works well on numerous databases that support arrays/vectors.


There are dozens of different models that can generate embeddings: https://docs.marqo.ai/1.0.0/Models-Reference/dense_retrieval...

Most frameworks, like Haystack, can wrap embeddings generation for you.


I work at FeatureBase and I'm storing vectors from the Instructor Large library/model into our solution. Getting good results, which I should probably quantify at some point. One thing that FeatureBase does well is allow filtering of the vector space via SQL.

I would say that most people seem to prefer an engine that embeds and stores things as a service, but using Instructor is only a few lines of code and runs locally.


Just to quickly add to ukuina's comment, marqo.ai does embedding generation and vector search end to end, so you can put in documents and the embeddings are automatically generated.


Lot of tools that can do this and they've long been around. For example, txtai has been able to generate embeddings with sentence-transformers since 2020.


SentenceTransformers all-MiniLM-L6-v2 is still your best bet since you can generate them in batches with GPU acceleration.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: