I hope you’all got a chance to take a look at our Open Source Advent that we kicked off in December to celebrate different open source projects. I shared the first 15 projects a few weeks ago and I was pumped to hear back from some of you including a blog from Michael Romagne about his experience with some of the projects!
Now that we are back in action at work, I thought it would be a good idea to share the 9 projects that ended the celebration.
But before we do, I do want to say that 🤩 all these lovely Open-Source projects would love a little 🎉💕 love by getting a GitHub star ⭐ for their efforts. Including Open Source Milvus 🥰
Alright, let’s take a look at the NINE second half projects! Each project has a link to the repo and tutorial to help you get started.
LLMWare is a unified, open, extensible framework for LLM-based application patterns, including Retrieval Augmented Generation (RAG).
Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with any framework.
Apache Paimon is a streaming storage layer to extend Apache Flink’s Stream Processing capabilities on the datalake.
VectorFlow is a high-volume vector embedding pipeline that ingests raw data, transforms it into vectors, and writes it to a vector DB of your choice.
Pachyderm is kind of like git for your data – it’s a language agnostic data versioning and pipelining tool.
GPTCache is an open-source library designed to improve the efficiency and speed of GPT-based applications by implementing a cache to store the responses generated by language models. GPTCache allows users to customize the cache according to their needs, including options for embedding functions, similarity evaluation functions, storage location, and eviction.
Ray is an open-source general-purpose framework to scale AI workloads with a set native Python ML libraries and integrations.
SuperGradients is PyTorch based training library that allows you to train & fine-tune SOTA computer vision models with ease.
Temporian is a library for safe, simple and efficient preprocessing and feature engineering of temporal data in Python. Temporian supports multivariate time-series, multivariate time-sequences, event logs, and cross-source event streams. Temporian is to temporal data what Pandas is to tabular data.
Hope you enjoyed that as much as we did! Let me know if there are some other projects we can include in these types of events we will do again soon! Here’s to building cool stuff in 2024!