Software

3 minute read

First 15 Open Source Advent projects

December 16, 2023

Just 10 days to go!

We launched Open Source Advent at the begininng of this month to celebrate 25 different open source projects. It has been fun sharing these projects and I thought I would reshare the first 15 projects! Take a look at the repo, try the tutorial and let us know what you build!

1. Milvus by Zilliz | Github

Milvus is an open-source vector database that powers embedding similarity search and AI applications and strives to make vector search accessible to every organization. Milvus can store, index, and manage a billion+ embedding vectors generated by deep neural networks and other machine learning (ML) models. It is the project we all work on here at Zilliz, so, of course it is on the list. 😇

2. FiftyOne by Voxel51 | Github | tutorial

FiftyOne is the open source toolkit for building high-quality datasets and computer vision models. With FiftyOne you can visualize, curate, manage, and QA data, and automate the workflows that make enterprise machine learning possible. They spoke at the last Unstructured Meetup and you can check out the recording here (29:10 – Speaker Jacob Marks, Vector search with computer vision data using Voxel51)

3. Quivr | GitHub | tutorial

Quivr is your personal productivity assistant to chat with your dumped files (PDF, CSV) & apps using GPT 3.5 / 4 turbo, Private, Anthropic, VertexAI, LLMs that you can share with users! Alternative to OpenAI GPTs.

4. Haystack by Deepset | Github | tutorial

Haystack is an end-to-end NLP framework that enables you to build applications powered by LLMs, Transformer models, vector search, and more. Whether you want to perform question answering, answer generation, semantic document search, or build tools capable of complex decision-making and query resolution, you can use state-of-the-art NLP models with Haystack to build end-to-end NLP applications to solve your use case. We have a video on some examples of retrieval augmentation in Haystack.

5. Proton by Timeplus | Github | tutorial

Proton is a streaming analytics database, based on ClickHouse, written in C++. Fast. Powerful, Easy

6. Ydata-synthetic and Ydata-profiling by YData | Github | tutorial

Ydata-profiling is a Python package for automated Data Quality profiling reports in a single line of code. Ydata-synthetic is a package to generate synthetic tabular and time-series data with state-of-the-art generative models.

7. Apache Flink | Github | tutorial

Apache Flink is the leading framework and distributed processing engine for stateful computations over unbounded and bounded data streams.

8. LangChain RB | Github | tutorial

LangChain RB is an original Langchain-inspired Ruby framework. The goal is to abstract complexity and difficult concepts to make building AI/ML-supercharged applications approachable for traditional Ruby software engineers. If you are a Ruby fan, we have a video to show you how to build a GenAI App End-to-End with Ruby

9. Flyte by Union AI | Github | tutorial

Flyte is an open-source orchestrator that facilitates building production-grade data and ML pipelines. It is built for scalability and reproducibility, leveraging Kubernetes as its underlying platform. With Flyte, user teams can construct pipelines using the Python SDK and seamlessly deploy them in both cloud and on-premises environments, enabling distributed processing and efficient resource utilization.

10. DVC by Iterative | Github | tutorial

DVC is a command line tool to help you develop reproducible machine learning projects.

11. Phoenix by Arize AI | Github | tutorial

Phoenix is Arize AI’s open-source observability library designed for experimentation, fine-tuning, and troubleshooting your LLM, CV, and NLP models in a notebook.

12. TruLens by TruEra | Github | tutorial

Observability of LLM and Multimodal apps with deep instrumentation and comprehensive evals.

13. OpenLLM by BentoML | Github | tutorial

OpenLLM is an open-source platform designed to facilitate the deployment and operation of large language models (LLMs) in real-world applications. With OpenLLM, you can run inference on any open-source LLM, deploy them on the cloud or on-premises, and build powerful AI applications.

14. LabelStudio by Human Signal | Github | tutorial

A flexible data labeling tool for all data types. Prepare training data for computer vision, natural language processing, speech, voice, and video models.

15. LlamaIndex | Github | tutorial

LamaIndex is a data framework for LLM-based applications to ingest, structure, and access private or domain-specific data.

Milvus Adventures Dec 15, 2023

December 16, 2023

Software

SQL for Big Data: Tips and Tricks Every Data Scientist Should Know

December 16, 2023

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

AI-Powered Machine Vision Technologies Are Revolutionizing Industrial Applications

Why Hybrid Agentic AI Is the Future of QA

The Shift That Already Happened

Trending Tags

First 15 Open Source Advent projects

Just 10 days to go!

1. Milvus by Zilliz | Github

2. FiftyOne by Voxel51 | Github | tutorial

3. Quivr | GitHub | tutorial

4. Haystack by Deepset | Github | tutorial

5. Proton by Timeplus | Github | tutorial

6. Ydata-synthetic and Ydata-profiling by YData | Github | tutorial

7. Apache Flink | Github | tutorial

8. LangChain RB | Github | tutorial

9. Flyte by Union AI | Github | tutorial

10. DVC by Iterative | Github | tutorial

11. Phoenix by Arize AI | Github | tutorial

12. TruLens by TruEra | Github | tutorial

13. OpenLLM by BentoML | Github | tutorial

14. LabelStudio by Human Signal | Github | tutorial

15. LlamaIndex | Github | tutorial

Leave a Reply Cancel reply

Previous Post

Milvus Adventures Dec 15, 2023

Next Post

SQL for Big Data: Tips and Tricks Every Data Scientist Should Know

First 15 Open Source Advent projects

Just 10 days to go!

1. Milvus by Zilliz | Github

2. FiftyOne by Voxel51 | Github | tutorial

3. Quivr | GitHub | tutorial

4. Haystack by Deepset | Github | tutorial

5. Proton by Timeplus | Github | tutorial

6. Ydata-synthetic and Ydata-profiling by YData | Github | tutorial

7. Apache Flink | Github | tutorial

8. LangChain RB | Github | tutorial

9. Flyte by Union AI | Github | tutorial

10. DVC by Iterative | Github | tutorial

11. Phoenix by Arize AI | Github | tutorial

12. TruLens by TruEra | Github | tutorial

13. OpenLLM by BentoML | Github | tutorial

14. LabelStudio by Human Signal | Github | tutorial

15. LlamaIndex | Github | tutorial

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts