Luke Kim

Founder and CEO of Spice AI

View all authors

Announcing Spice.ai Open Source 1.0-stable

January 22, 2025 · 10 min read

Luke Kim

Founder and CEO of Spice AI

🎉 Today marks the 1.0-stable release of Spice.ai Open Source—designed to help enterprises ground AI in data.

Enterprise AI systems are only as good as the context they’re provided. When data is inaccessible, incomplete, or outdated, even the most advanced models can generate outputs that are inaccurate, misleading, or worse, potentially harmful. In one example, a chatbot was tricked into selling a 2024 Chevy Tahoe for $1 due to a lack of contextual safeguards. For enterprises, errors like these are unacceptable—it’s the difference between success and failure.

Retrieval-Augmented Generation (RAG) is part of the answer — but traditional RAG is only as good as the data it has access to. If data is locked away in disparate, often legacy data systems, or cannot be stitched together for accurate retrieval, you get, as Benioff puts it, "Clippy 2.0".

When you look at how Copilot has been delivered to customers, it’s disappointing. It just doesn’t work, and it doesn’t deliver any level of accuracy. Gartner says it’s spilling data everywhere, and customers are left cleaning up the mess. To add insult to injury, customers are…
— Marc Benioff (@Benioff) October 17, 2024

And often, after initial Python-scripted pilots, you’re left with a new set of problems: How do you deploy AI that meets enterprise requirements for performance, security, and compliance while being cost efficient? Directly querying large datasets for retrieval is slow and expensive. Building and maintaining complex ETL pipelines requires expensive data teams that most organizations don’t have. And because enterprise data is highly sensitive, you need secure access and auditable observability—something many RAG setups don’t even consider.

Developers need a platform at the intersection of data and AI—one specifically designed to ground AI in data. A solution that unifies data query, search, retrieval, and model inference—ensuring performance, security, and accuracy so you can build AI that you and your customers can trust.

Spice.ai OSS: A Portable Data, AI, and Retrieval Engine

In March of 2024, we introduced Spice.ai Open Source, a SQL query engine to materialize and accelerate data from any database, data warehouse, or data lake so that data can be accessed wherever it lives across the enterprise — consistently fast. But that was only the start.

Building on this foundation, Spice.ai OSS unifies data, retrieval, and AI, to mitigate AI “hallucinations” head-on by providing current, relevant context that significantly reduces incorrect outputs.

Spice is a portable, single-node, compute engine built in Rust. It embeds the fastest single-node SQL query engine, DataFusion, to serve secure, virtualized data views to data-intensive apps, AI, and agents. Sub-second data query is accelerated locally using Apache Arrow, DuckDB, or SQLite.

Now at version 1.0-stable, Spice is ready for production. It’s already deployed in enterprise use at Twilio, Barracuda Networks, and NRC Health, and can be deployed anywhere—cloud-hosted, BYOC, edge, on-prem.

A diagram of the Spice.ai OSS compute-engine

Data-grounded AI

Data-grounded AI anchors models in accurate, current, and domain-specific data, rather than relying solely on pre-trained knowledge. By unifying enterprise data—across databases, data lakes, and APIs—and applying advanced ingestion and retrieval techniques, these systems dynamically incorporate real-world context at inference time without leaking sensitive information. This approach helps developers minimize hallucinations, reduce operational risk, and build trust in AI by delivering reliable, relevant outputs.

Comparison of AI without contextual data and with data-grounded AI

How does Spice.ai OSS solve data-grounding?

With Spice, models always have access to projections of low-latency, real-time data for near-instant retrieval, minimizing data movement while enabling AI feedback so apps and agents can learn and adapt over time. For example, you can join customer records from PostgreSQL with sales data in Snowflake and logs stored in S3—all with a single SQL query or LLM function call.

Secure Compute Engine for Data-grounded AI

Spice includes an advanced suite of LLM tools including vector and hybrid search, text-to-SQL, SQL query and retrieval, data sampling, and context formatting—all purpose-built for accurate outputs.

The latest research is continually incorporated so that teams can focus on business objectives rather than trying to keep up with the incredibly fast-moving and often overwhelming space of AI.

Spice.ai OSS: The engine that makes AI work

Spice.ai OSS is a lightweight, portable runtime (single ~140 MB binary) with the capabilities of a high-speed cloud data warehouse built into a self-hostable AI inference engine, all in a single, run-anywhere package.

It's designed to be distributed and integrated at the application level, rather than being a bulky, centralized system to manage, and is often deployed as a sidecar. Whether running one Spice instance per service or one for each customer, Spice is flexible enough to fit your application architecture.

Apps and agents integrate with Spice.ai OSS via three industry-standard APIs, so that it can be adopted incrementally with minimal changes to applications.

SQL Query APIs: HTTP, Arrow Flight, Arrow Flight SQL, ODBC, JDBC, and ADBC.
OpenAI-Compatible APIs: HTTP APIs compatible with the OpenAI SDK, AI SDK with local model serving (CUDA/Metal accelerated), and gateway to hosted models.
Iceberg Catalog REST APIs: A unified Iceberg Catalog REST API.

Spice.ai OSS architecture

Key features of Spice.ai OSS include:

Federated SQL Query Across Data Sources: Perform SQL queries across disparate data sources with over 25 open-source data connectors, including catalogs (Unity Catalog, Iceberg Catalog, etc), databases (PostgreSQL, MySQL, etc.), data warehouses (Snowflake, Databricks, etc.), and data lakes (e.g., S3, ABFS, MinIO, etc.).
Data Materialization and Acceleration: Locally materialize and accelerate data using Arrow, DuckDB, SQLite, and PostgreSQL, enabling low-latency and high-speed transactional and analytical queries. Data can be ingested via Change-Data-Capture (CDC) using Debezium, Catalog integrations, on an interval, or by trigger.
AI Inference, Gateway, and LLM toolset: Load and serve models like Llama3 locally, or use Spice as a gateway to hosted AI platforms including OpenAI, Anthropic, xAI, and NVidia NIM. Automatically use a purpose-built LLM toolset for data-grounded AI.
Enterprise Search and Retrieval: Advanced search capabilities for LLM applications, including vector-based similarity search and hybrid search across structured and unstructured data. Real-time retrieval grounds AI applications in dynamic, contextually relevant information, enabling state-of-the-art RAG.
LLM Memory: Enable long-term memory for LLMs by efficiently storing, retrieving, and updating context across interactions. Support real-time contextual continuity and grounding for applications that require persistent and evolving understanding.
LLM Evaluations: Test and boost model reliability and accuracy with integrated LLM-powered evaluation tools to assess and refine AI outputs against business objectives and user expectations.
Monitoring and Observability: Ensure operational excellence with telemetry, distributed tracing, query/task history, and metrics, that provide end-to-end visibility into data flows and model performance in production.
Deploy Anywhere; Edge-to-Cloud Flexibility: Deploy Spice as a standalone instance, Kubernetes sidecar, microservice, or scalable cluster, with the flexibility to run distributed across edge, on-premises, or any cloud environment. Spice AI offers managed, cloud-hosted deployments of Spice.ai OSS through the Spice Cloud Platform (SCP).

Real-world use-cases

Spice delivers data readiness for teams like Twilio and Barracuda, and accelerates time-to-market of data-grounded AI, such as with developers on GitHub and at NRC Health.

Here are some examples of how Spice.ai OSS solves real problems for these teams.

CDN for Databases — Twilio

A core requirement for many applications is consistently fast data access, with or without AI. Twilio uses Spice.ai OSS as a Database CDN, staging data in object-storage that's accelerated with Spice for sub-second query to improve the reliability of critical services in its messaging pipelines. Before Spice, a database outage could result in a service outage.

With Spice, Twilio has achieved:

Significantly Improved Query Performance: Using Spice to co-locate control-plane data in the messaging runtime, accelerated with DuckDB, to send messages with a P99 query time of < 5ms.
Mission-Critical Reliability: Reduced reliance on queries to databases by using Spice to query data directly from S3 for uninterrupted service even during database downtime.

By adopting Spice.ai OSS, Twilio strengthened its infrastructure, ensuring reliable services for customers and scalable data access across its growing platform.

Datalake Accelerator — Barracuda

Barracuda uses Spice.ai OSS to modernize data access for their email archiving and audit log systems, solving two big problems: slow query performance and costly queries. Before Spice, customers experienced frustrating delays of up to two minutes when searching email archives, due to the data volume being queried.

With Spice, Barracuda has achieved:

100x Query Performance Improvement: Accelerated email archive queries from a P99 time of 2 minutes to 100-200 milliseconds.
Efficient Audit Logs: Offloaded audit logs to Parquet files in S3, queried directly by Spice.
Mission-Critical Reliability: Reduced load on Cassandra, improving overall infrastructure stability.
Significant Cost Reduction: Replaced expensive Databricks Spark queries, significantly cutting expenses while improving performance.

Data-Grounded AI apps and agents — NRC Health

NRC Health uses Spice.ai OSS to simplify and accelerate the development of data-grounded AI features, unifying data from multiple platforms including MySQL, SharePoint, and Salesforce, into secure, AI-ready data. Before Spice, scaling AI expertise across the organization to build complex RAG-based scenarios was a challenge.

With Spice OSS, NRC Health has achieved:

Developer Productivity: Partnered with Spice in three company-wide AI hackathons to build complete end-to-end data-grounded AI features in hours instead of weeks or months.
Accelerated Time-to-Market: Centralized data integration and AI model serving an enterprise-ready service, accelerating time to market.

Data-Grounded AI Software Development — Spice.ai GitHub Copilot Extension

When using tools like GitHub Copilot, developers often face the hassle of switching between multiple environments to get the data they need.

The Spice.ai for GitHub Copilot Extension built on Spice.ai OSS, gives developers the ability to connect data from external sources to Copilot, grounding Copilot in relevant data not generally available in GitHub, like test data stored in a development database.

Developers can simply type @spiceai to interact with connected data, with relevant answers now surfaced directly in Copilot Chat, significantly improving productivity.

Why choose Spice.ai OSS?

Adopting Spice.ai OSS addresses real challenges in modern AI development: it grounds models in accurate, domain-specific, real-time data. With Spice, engineering teams can focus on what matters—delivering innovative, accurate, AI-powered applications and agents that work. Additionally, Spice.ai OSS is open-source under Apache 2.0, ensuring transparency and extensibility so your organization remains free to innovate without vendor lock-in.

Get started in 30 seconds

You can install Spice.ai OSS in less than a minute, on macOS, Linux, and Windows:

macOS, Linux, and WSL
Windows

curl https://install.spiceai.org | /bin/bash

Or using brew:

brew install spiceai/spiceai/spice

curl -L "https://install.spiceai.org/Install.ps1" -o Install.ps1 && PowerShell -ExecutionPolicy Bypass -File ./Install.ps1

Once installed, follow the Getting Started with Spice.ai guide to ground OpenAI chat with data from S3 in less than 2 minutes.

Looking ahead

The 1.0-stable release of Spice.ai OSS marks a major step toward accurate AI for developers. By combining data, AI, and retrieval into a unified runtime, Spice anchors AI in relevant, real-time data—helping you build apps and agents that work.

A cloud-hosted, fully managed Spice.ai OSS service is available in the Spice Cloud Platform. It’s SOC 2 Type II compliant and makes it easy to operate Spice deployments.

Beyond apps and agents, the vision for Spice is to be the best digital labor platform for building autonomous AI employees and teams. These are exciting times! Stay tuned for some upcoming announcements later in 2025!

The Spice AI Team

Learn more

Cookbook: 47+ samples and examples using Spice.ai OSS
Documentation: Learn about features, use cases, and advanced configurations
@spice_ai: Follow X for news and updates
Discord: Connect with the team and the community
GitHub: Star the repo, contribute, and raise issues

Spice v1.0-rc.3 (Dec 30, 2024)

December 30, 2024 · 7 min read

Luke Kim

Founder and CEO of Spice AI

Announcing the release of Spice v1.0-rc.3 🧊

Spice v1.0.0-rc.3 is the third release candidate for the first major version of Spice.ai OSS. This release continues the focus on production readiness and includes new Iceberg Catalog APIs, DuckDB improvements, and a new Iceberg Catalog Connector.

Highlights in v1.0-rc.3

Iceberg Catalog APIs: Spice now functions as an Iceberg Catalog provider, implementing a core subset of the Iceberg Catalog APIs. This enables Iceberg Catalog clients native discovery of datasets and schemas through Spice APIs.
GET /v1/namespaces - List all catalogs registered in Spice.
GET /v1/namespaces?parent=catalog - List schemas registered under a given catalog.
GET /v1/namespaces/:catalog_schema/tables - List tables registered under a given schema.
GET /v1/namespaces/:catalog_schema/tables/:table - Get the schema of a given table.
Iceberg Catalog Connector: The Iceberg Catalog Connector is a new integration to discover and query datasets from a remote Iceberg Catalog.

Example connecting to a remote Iceberg Catalog with tables stored in S3:

catalogs:
  - from: iceberg:https://my-iceberg-catalog.com/v1/namespaces
    name: ice
    params:
      iceberg_s3_access_key_id: ${secrets:ICEBERG_S3_ACCESS_KEY_ID}
      iceberg_s3_secret_access_key: ${secrets:ICEBERG_S3_SECRET_ACCESS_KEY}
      iceberg_s3_region: us-east-1

View the Iceberg Catalog Connector documentation for more details.

DuckDB Improvements: Added cosine_distance support for DuckDB-backed vector search, improved unnest nested type handling for array_element and lists, and optimized query performance.
SQLite Data Accelerator: Graduated to Release Candidate (RC).
File Data Accelerator: Graduated to Release Candidate (RC).

Breaking changes

API:v1/datasets/sample has been removed as it is not particularly useful, can be replicated via SQL, and via the tools endpoint POST v1/tools/:name.

Cookbook

New Language Model Evals Recipe showing how to measure the performance of a language model using LLM-as-Judge, configured entirely in the spice runtime.
New Iceberg Catalog Recipe showing how to use Spice to query Iceberg tables from an Iceberg catalog.

Dependencies

OpenTelemetry: Upgraded from 0.26.0 to 0.27.1
Go: Upgraded from 1.22 to 1.23 (CLI)

Contributors

@sgrebnov
@phillipleblanc
@peasee
@Jeadie
@Sevenannn
@lukekim
@ewgenius

What's Changed

- Add CI configuration for search benchmark dataset access by @sgrebnov in https://github.com/spiceai/spiceai/pull/3888
- Update acknowledgements by @github-actions in https://github.com/spiceai/spiceai/pull/3895
- Upgrade dependencies by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3896
- chore: Update helm chart for RC.2 by @peasee in https://github.com/spiceai/spiceai/pull/3899
- Update spicepod.schema.json by @github-actions in https://github.com/spiceai/spiceai/pull/3903
- chore: Update MacOS test release install to macos-13 by @peasee in https://github.com/spiceai/spiceai/pull/3901
- Add usage to `spice chat` and fix `v1/models?status=true`. by @Jeadie in https://github.com/spiceai/spiceai/pull/3898
- chore: Bump versions for rc3 by @peasee in https://github.com/spiceai/spiceai/pull/3902
- docs: Update endgame with a step to verify dependencies in release notes by @peasee in https://github.com/spiceai/spiceai/pull/3897
- Ensure eval dataset input and ouput of correct length by @Jeadie in https://github.com/spiceai/spiceai/pull/3900
- `spice add/connect/dataset configure` should update spicepod, not overwrite it & upgrade to Go 1.23 by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3905
- Bump opentelemetry from 0.26.0 to 0.27.1 by @dependabot in https://github.com/spiceai/spiceai/pull/3879
- Ensure trace_id is overridden for prior written spans by @Jeadie in https://github.com/spiceai/spiceai/pull/3906
- add 'role': 'assistant' for local models by @Jeadie in https://github.com/spiceai/spiceai/pull/3910
- Run tpcds benchmark for file connector by @Sevenannn in https://github.com/spiceai/spiceai/pull/3924
- Update to reference cookbook instead of quickstarts/samples by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3928
- Fix/remove flaky integration tests by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3930
- Implement `/v1/iceberg/namespaces` & `/v1/iceberg/config` APIs by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3923
- Add script for creating tpcds parquet files and spicepod for file connector by @Sevenannn in https://github.com/spiceai/spiceai/pull/3931
- Use `utoipa` to generate openapi.json and swagger for dev by @Jeadie in https://github.com/spiceai/spiceai/pull/3927
- `fuzzy_match`, `json_match`, `includes` scorer by @Jeadie in https://github.com/spiceai/spiceai/pull/3926
- Implement `/v1/iceberg/namespaces/:namespace` by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3933
- Implement `GET /v1/iceberg/namespaces/:namespace/tables` API by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3934
- Add custom Spice DuckDB dialect with cosine_distance support by @sgrebnov in https://github.com/spiceai/spiceai/pull/3938
- Fix NSQL error: `all columns in a record batch must have the same length` by @sgrebnov in https://github.com/spiceai/spiceai/pull/3947
- Don't include tools use in hf test model by @Jeadie in https://github.com/spiceai/spiceai/pull/3955
- Implement `GET /v1/namespaces/{namespace}/tables/{table}` API by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3940
- Update dependencies by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3967
- DuckDB: add support for nested types in Lists by @sgrebnov in https://github.com/spiceai/spiceai/pull/3961
- Add script to set up clickbench for file connector by @Sevenannn in https://github.com/spiceai/spiceai/pull/3945
- docs: Add connector stable criteria by @peasee in https://github.com/spiceai/spiceai/pull/3908
- Update Roadmp Dec 23, 2024 by @lukekim in https://github.com/spiceai/spiceai/pull/3978
- Improve CI testing for OpenAPI, new tool `spiceschema`, fix broken OpenAPI stuff. by @Jeadie in https://github.com/spiceai/spiceai/pull/3948
- remove `v1/datasets/sample` by @Jeadie in https://github.com/spiceai/spiceai/pull/3981
- feat: add SQLite ClickBench benchmark by @peasee in https://github.com/spiceai/spiceai/pull/3975
- Remove feature 'llms/mistralrs' by @Jeadie in https://github.com/spiceai/spiceai/pull/3984
- Add support for 'params.spice_tools: nsql' by @Jeadie in https://github.com/spiceai/spiceai/pull/3985
- Fix integration tests - add missing `format` query parameter in /v1/status requests by @ewgenius in https://github.com/spiceai/spiceai/pull/3989
- Enhance AI tools sampling logic for robust handling of large fields by @sgrebnov in https://github.com/spiceai/spiceai/pull/3959
- Fix subquery federation by @Sevenannn in https://github.com/spiceai/spiceai/pull/3991
- Fix unnest and add DuckDB support for `array_element` by @sgrebnov in https://github.com/spiceai/spiceai/pull/3995
- Add score value snapshotting to vector similarity search tests by @sgrebnov in https://github.com/spiceai/spiceai/pull/3996
- Use Llama-3.2-3B-Instruct for Hugging Face integration testing by @sgrebnov in https://github.com/spiceai/spiceai/pull/3992
- Simplify `construct_chunk_query_sql` for DuckDB compatibility by @sgrebnov in https://github.com/spiceai/spiceai/pull/3988
- Update TPCH and TPCDS benchmarks for spice.ai connector by @ewgenius in https://github.com/spiceai/spiceai/pull/3982
- Correctly pass Hugging Face token in models integration tests by @sgrebnov in https://github.com/spiceai/spiceai/pull/3997
- Fix: `on_zero_results` causes `TransactionContext Error: Catalog write-write conflict on create with "attachment_0"` by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3998
- Add DuckDB acceleration to search benchmarks by @sgrebnov in https://github.com/spiceai/spiceai/pull/4000
- Enable Postgres write via non-default `postgres-write` feature flag by @sgrebnov in https://github.com/spiceai/spiceai/pull/4004
- Allow search benchmark to write test results by @sgrebnov in https://github.com/spiceai/spiceai/pull/4008
- Make Flight DoPut atomic and commit write only on successful stream completion by @sgrebnov in https://github.com/spiceai/spiceai/pull/4002
- Create a `CatalogConnector` abstraction by @phillipleblanc in https://github.com/spiceai/spiceai/pull/4003
- Fix `generate-openapi.yml` and add `.schema/openapi.json`. by @Jeadie in https://github.com/spiceai/spiceai/pull/3983
- Enable spice.ai tpcds bench workflow. Comment failing tpch queries. by @ewgenius in https://github.com/spiceai/spiceai/pull/4001
- feat: Add SQLite ClickBench overrides by @peasee in https://github.com/spiceai/spiceai/pull/4016
- Implement Iceberg Catalog Connector by @phillipleblanc in https://github.com/spiceai/spiceai/pull/4053
- feat: Datafusion updates for SQLite fixes and release by @peasee in https://github.com/spiceai/spiceai/pull/4054
- docs: Add accelerator stable release criteria by @peasee in https://github.com/spiceai/spiceai/pull/4017
- Add dremio tpch / tpcds benchmark test by @Sevenannn in https://github.com/spiceai/spiceai/pull/4063
- Update docs, and make PR to `spiceai/docs` for new `openapi.json`. by @Jeadie in https://github.com/spiceai/spiceai/pull/4019
- Update openapi.json by @github-actions in https://github.com/spiceai/spiceai/pull/4065
- Fix dremio subquery rewrite by @Sevenannn in https://github.com/spiceai/spiceai/pull/4064
- Update generate-openapi.yml by @Jeadie in https://github.com/spiceai/spiceai/pull/4073
- docs: Add catalog criteria by @peasee in https://github.com/spiceai/spiceai/pull/4052
- fix `distinct_columns` in auto/nsql tool groups by @Jeadie in https://github.com/spiceai/spiceai/pull/4074
- Update openapi.json by @github-actions in https://github.com/spiceai/spiceai/pull/4075
- Update openapi.json by @github-actions in https://github.com/spiceai/spiceai/pull/4076
- Implement window_func_support_window_frame from DremioDialect by @Sevenannn in https://github.com/spiceai/spiceai/pull/4012
- Update acknowledgements by @github-actions in https://github.com/spiceai/spiceai/pull/4079
- Promote file connector to rc by @Sevenannn in https://github.com/spiceai/spiceai/pull/4080
- Add Iceberg to README by @phillipleblanc in https://github.com/spiceai/spiceai/pull/4085
- Fix '/v1/status' default format by @Jeadie in https://github.com/spiceai/spiceai/pull/4081

**Full Changelog**: https://github.com/spiceai/spiceai/compare/v1.0.0-rc.2...v1.0.0-rc.3

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: hey@spice.ai

Spice v0.19.3-beta (Oct 28, 2024)

October 28, 2024 · 4 min read

Luke Kim

Founder and CEO of Spice AI

Announcing the release of Spice v0.19.3-beta 📈

Spice v0.19.3-beta improves the performance and stability of data connectors and accelerators, including faster queries across multiple federated sources by optimizing how filters are applied. Anthropic has also been added as a LLM model provider.

Highlights in v0.19.3

DataFusion Fixes: Resolved bugs in DataFusion and DataFusion Table Providers, expanding TPC-DS coverage and correctness.

GitHub Data Connector Beta Milestone: The GitHub Data Connector has graduated to Beta after extensive testing, stability, and performance improvements.

Anthropic Models Provider: Anthropic has been added as an LLM provider, including support for streaming.

Example spicepod.yml:

models:
  - from: anthropic:claude-3-5-sonnet-20240620
    name: claude_3_5_sonnet
    params:
      anthropic_api_key: ${ secrets:SPICE_ANTHROPIC_API_KEY }

Breaking changes

None.

Contributors

@Jeadie
@Sevenannn
@phillipleblanc
@peasee
@sgrebnov
@nlamirault
@barracudarin
@lukekim
@slyons

New Contributors

@nlamirault made their first contribution in https://github.com/spiceai/spiceai/pull/3207
@barracudarin made their first contribution in https://github.com/spiceai/spiceai/pull/3228

What's Changed

- Make Anthropic OpenAI compatible. by @Jeadie in https://github.com/spiceai/spiceai/pull/3087
- Update spicepod.schema.json by @github-actions in https://github.com/spiceai/spiceai/pull/3200
- Bump version to 1.0.0-rc.1 by @Sevenannn in https://github.com/spiceai/spiceai/pull/3202
- Fix clickhouse schema inference for non-default database by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3201
- Update endgame template by @Sevenannn in https://github.com/spiceai/spiceai/pull/3198
- Upgrade dependencies by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3197
- fix: dataset refresh defaults properties to None by @peasee in https://github.com/spiceai/spiceai/pull/3205
- Upgrade OTEL to v0.26 and make seconds based metrics reported precisely by @sgrebnov in https://github.com/spiceai/spiceai/pull/3203
- use `text_embedding_inference::Infer` for more complete embedding solution by @Jeadie in https://github.com/spiceai/spiceai/pull/3199
- Add S3 parquet file - arrow accelerator e2e test by @Sevenannn in https://github.com/spiceai/spiceai/pull/3154
- feat: Add script to setup clickbench on mysql by @peasee in https://github.com/spiceai/spiceai/pull/3176
- Update helm chart version to v0.19.2 by @Sevenannn in https://github.com/spiceai/spiceai/pull/3210
- Add sample dataset option in `v1/nsql`. by @Jeadie in https://github.com/spiceai/spiceai/pull/3105
- Split spiced_docker build across architectures by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3206
- feat(helm): do not install demo dataset by default by @nlamirault in https://github.com/spiceai/spiceai/pull/3207
- Split integration test across build/run steps by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3215
- feat(helm): Refactoring Kubernetes labels by @nlamirault in https://github.com/spiceai/spiceai/pull/3208
- Define 'tool_recursion_limit' for LLMs, and limit internal tool calling recursion. by @Jeadie in https://github.com/spiceai/spiceai/pull/3214
- Improve filters pushdown for federated queries by @sgrebnov in https://github.com/spiceai/spiceai/pull/3183
- Implement native schema inference for PostgreSQL by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3209
- docs: Update release criteria by @peasee in https://github.com/spiceai/spiceai/pull/3219
- Run SQLite acceleration TPC-DS tests using smaller scale by @sgrebnov in https://github.com/spiceai/spiceai/pull/3227
- bind the serviceAccount if a name is given or if we're creating one by @barracudarin in https://github.com/spiceai/spiceai/pull/3228
- Only emit channel send error log when its not a closed channel error by @Jeadie in https://github.com/spiceai/spiceai/pull/3230
- Enable Parquet Exec filter pushdown in Spice by @Sevenannn in https://github.com/spiceai/spiceai/pull/3216
- Add snapshots for SQLite TPC-DS benchmark (file mode) by @sgrebnov in https://github.com/spiceai/spiceai/pull/3234
- docs: Add SDK release checks to endgame by @peasee in https://github.com/spiceai/spiceai/pull/3256
- Implement `localpod` Data Connector by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3249
- Revert "Enable Parquet Exec filter pushdown in Spice (#3216)" by @Sevenannn in https://github.com/spiceai/spiceai/pull/3244
- refactor: Use existing action for detecting changes by @peasee in https://github.com/spiceai/spiceai/pull/3255
- feat: Add GitHub integration test by @peasee in https://github.com/spiceai/spiceai/pull/3226
- Add get_readiness tool to retrieve status of all registered components by @lukekim in https://github.com/spiceai/spiceai/pull/3035
- Improve CLI error output when REPL can't connect to the Flight endpoint by @slyons in https://github.com/spiceai/spiceai/pull/3188
- Fixing FTP link in Endgame by @slyons in https://github.com/spiceai/spiceai/pull/3267
- Update version to 0.19.3-beta by @sgrebnov in https://github.com/spiceai/spiceai/pull/3269
- add service type and annotation customizations in https://github.com/spiceai/spiceai/pull/3268

**Full Changelog**: https://github.com/spiceai/spiceai/compare/v0.19.2-beta...v0.19.3-beta

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: hey@spice.ai

Spice v0.19.2-beta (Oct 21, 2024)

October 21, 2024 · 4 min read

Luke Kim

Founder and CEO of Spice AI

Announcing the release of Spice v0.19.2-beta ⚡

Spice v0.19.2-beta continues to improve performance and stability of data connectors and data accelerators, further expands TPC-DS coverage, and includes several bug fixes.

Highlights in v0.19.2

DataFusion Fixes: Resolved bugs in DataFusion and DataFusion Table Providers, improving TPC-DS query support and correctness.

TPC-DS Snapshots: Extended support for TPC-DS benchmarks with added snapshot tests for validating query plans and result accuracy.

PostgreSQL Accelerator Beta: Postgres Data Accelerator has been promoted to Beta Quality

Breaking changes

The hive_infer_partitions parameter been changed to hive_partitioning_enabled, now defaults to false and must be explicitly enabled.

Contributors

@ewgenius
@sgrebnov
@slyons
@Jeadie
@Sevenannn
@phillipleblanc
@dependabot
@peasee

Dependencies

DataFusion Table Providers: Upgraded to rev 2bcf481b4abe9d0bd6bb2479ce49020df66ff97f.
duckdb-rs: Upgraded from 1.0.0 to 1.1.1.

What's Changed

- Update Helm chart for v0.19.1-beta by @ewgenius in https://github.com/spiceai/spiceai/pull/3106
- Add more TPC-DS snapshots for Postgres acceleration by @sgrebnov in https://github.com/spiceai/spiceai/pull/3107
- Bumping version to 1.0.0-rc.1 by @slyons in https://github.com/spiceai/spiceai/pull/3109
- New table sampling methods: sample_distinct_columns, random_sample, top_n_sample by @Jeadie in https://github.com/spiceai/spiceai/pull/3108
- Add TPCDS snapshot tests for file-based and in-mem duckdb by @Sevenannn in https://github.com/spiceai/spiceai/pull/3115
- Add Postgres acceleration E2E test for MySQL by @sgrebnov in https://github.com/spiceai/spiceai/pull/3110
- Update datafusion logical plan to avoid wrong group_by columns in aggregation by @Sevenannn in https://github.com/spiceai/spiceai/pull/3111
- Warn if user tries to embed column that does not exist by @Jeadie in https://github.com/spiceai/spiceai/pull/3120
- Changes for Rust version upgrade by @Sevenannn in https://github.com/spiceai/spiceai/pull/3134
- Add `unnest` support for federated plans by @sgrebnov in https://github.com/spiceai/spiceai/pull/3133
- Don't `.clone()` unnecessarily by @Jeadie in https://github.com/spiceai/spiceai/pull/3128
- Fix Flight `get_schema` to construct logical plan and return that schema. by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3131
- Bump clap from 4.5.19 to 4.5.20 by @dependabot in https://github.com/spiceai/spiceai/pull/3099
- Add GitHub Workflow to build `spice-postgres-tpcds-bench` image by @sgrebnov in https://github.com/spiceai/spiceai/pull/3140
- test: Add basic MySQL integration test by @peasee in https://github.com/spiceai/spiceai/pull/3143
- Bump datafusion-federation and datafusion-table-providers crates by @sgrebnov in https://github.com/spiceai/spiceai/pull/3148
- docs: Add MySQL limitation for division by zero by @peasee in https://github.com/spiceai/spiceai/pull/3144
- fix: Dataset refresh by @peasee in https://github.com/spiceai/spiceai/pull/3147
- Update arrow, duckdb, postgres accelerator tpcds snapshots by @Sevenannn in https://github.com/spiceai/spiceai/pull/3145
- Add TPC-DS benchmarks for Postgres data connector by @sgrebnov in https://github.com/spiceai/spiceai/pull/3149
- Update E2E test ci to include tests for accelerating Postgres into accelerators by @Sevenannn in https://github.com/spiceai/spiceai/pull/3137
- Add TPCDS Benchmark test and snapshots for S3 by @Sevenannn in https://github.com/spiceai/spiceai/pull/3152
- [cli] Include 200 in acceptable response codes for `doRuntimeApiRequest` by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3157
- Use `-build.{GIT_SHA}` for unreleased versions by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3159
- Upgrade to Rust 1.82 by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3158
- Disable `hive_infer_partitions` by default by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3160
- Upgrade to DuckDB 1.1.1 by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3161
- feat: Add MySQL TPCDS results snapshots and exclude workarounds by @peasee in https://github.com/spiceai/spiceai/pull/3165
- Fix task_history output for sql, add output to table_schema & list_datasets tool by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3166
- feat: Add ClickBench queries as separate files by @peasee in https://github.com/spiceai/spiceai/pull/3169
- Calculate embeddings in a separate blocking thread by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3170
- docs: Update ROADMAP.md and release criterias by @peasee in https://github.com/spiceai/spiceai/pull/3124
- Handle OpenTelemetry errors by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3173
- Update version to 0.19.2-beta by @Sevenannn in https://github.com/spiceai/spiceai/pull/3182

**Full Changelog**: https://github.com/spiceai/spiceai/compare/v0.19.1-beta...v0.19.2-beta

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: hey@spice.ai

Spice v0.19.1-beta (Oct 14, 2024)

October 14, 2024 · 4 min read

Luke Kim

Founder and CEO of Spice AI

Announcing the release of Spice v0.19.1-beta 🔥

Spice v0.19.1 brings further performance and stability improvements to data connectors, including improved query push-down for file-based connectors (s3, abfs, file, ftp, sftp) that use Hive-style partitioning.

Highlights in v0.19.1

TPC-H and TPC-DS Coverage: Expanded coverage for TPC-H and TPC-DS benchmarking suites across accelerators and connectors.

GitHub Connector Array Filter: The GitHub connector now supports filter push down for the array_contains function in SQL queries using search query mode.

NSQL CLI Command: A new spice nsql CLI command has been added to easily query datasets with natural language from the command line.

Breaking changes

None

Contributors

@peasee
@Sevenannn
@sgrebnov
@karifabri
@phillipleblanc
@lukekim
@Jeadie
@slyons

Dependencies

DataFusion Table Providers: Upgraded to rev f22b96601891856e02a73d482cca4f6100137df8.

What's Changed

- release: Update helm chart for v0.19.0-beta by @peasee in https://github.com/spiceai/spiceai/pull/3024
- Set fail-fast = true for benchmark test by @Sevenannn in https://github.com/spiceai/spiceai/pull/2997
- release: Update next version and ROADMAP by @peasee in https://github.com/spiceai/spiceai/pull/3033
- Verify TPCH benchmark query results for Spark connector by @sgrebnov in https://github.com/spiceai/spiceai/pull/2993
- feat: Add x-spice-user-agent header to Spice REPL by @peasee in https://github.com/spiceai/spiceai/pull/2979
- Update to object store file formats documentation link by @karifabri in https://github.com/spiceai/spiceai/pull/3036
- Use teraswitch-runners for Linux x64 workflows + builds by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3042
- feat: Support array contains in GitHub pushdown by @peasee in https://github.com/spiceai/spiceai/pull/2983
- Bump text-splitter from 0.16.1 to 0.17.0 by @dependabot in https://github.com/spiceai/spiceai/pull/2987
- Revert integration tests back to hosted runner by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3046
- Tune Github runner resources to allow in memory TPCDS benchmark to run by @Sevenannn in https://github.com/spiceai/spiceai/pull/3025
- fix: add winver by @peasee in https://github.com/spiceai/spiceai/pull/3054
- refactor: Use is modifier for checking GitHub state filter by @peasee in https://github.com/spiceai/spiceai/pull/3056
- Enable `merge_group` checks for PR workflows by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3058
- Fix issues with merge group by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3059
- Validate in-memory arrow accelertion TPCDS result correctness by @Sevenannn in https://github.com/spiceai/spiceai/pull/3044
- Fix rev parsing for PR checks by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3060
- Use 'Accept' header for `/v1/sql/` and `/v1/nsql` by @Jeadie in https://github.com/spiceai/spiceai/pull/3032
- Verify Postgres acceleration TPCDS result correctness by @Sevenannn in https://github.com/spiceai/spiceai/pull/3043
- Add NSQL CLI REPL command by @lukekim in https://github.com/spiceai/spiceai/pull/2856
- Preserve query results order and add TPCH benchmark results verification for duckdb:file mode by @sgrebnov in https://github.com/spiceai/spiceai/pull/3034
- Refactor benchmark to include MySQL tpcds bench, tweaks to makefile target for generating mysql tpcds data by @Sevenannn in https://github.com/spiceai/spiceai/pull/2967
- Support runtime parameter for `sql_query_keep_partition_by_columns` & enable by default by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3065
- Document TPC-DS limitations: `EXCEPT`, `INTERSECT`, duplicate names by @sgrebnov in https://github.com/spiceai/spiceai/pull/3069
- Adding ABFS benchmark by @slyons in https://github.com/spiceai/spiceai/pull/3062
- Add support for GitHub app installation auth for GitHub connector by @ewgenius in https://github.com/spiceai/spiceai/pull/3063
- docs: Document stack overflow workaround, add helper script by @peasee in https://github.com/spiceai/spiceai/pull/3070
- Tune MySQL TPCDS image to allow for successful benchmark test run by @Sevenannn in https://github.com/spiceai/spiceai/pull/3067
- Automatically infer partitions for hive-style partitioned files for object store based connectors by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3073
- Support `hf_token` from params/secrets by @Jeadie in https://github.com/spiceai/spiceai/pull/3071
- Inherit embedding columns from source, when available. by @Jeadie in https://github.com/spiceai/spiceai/pull/3045
- Validate identifiers for component names by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3079
- docs: Add workaround for TPC-DS Q97 in MySQL by @peasee in https://github.com/spiceai/spiceai/pull/3080
- Document TPC-DS Postgres column alias in a CASE statement limitation by @sgrebnov in https://github.com/spiceai/spiceai/pull/3083
- Update plan snapshots for TPC-H bench queries by @sgrebnov in https://github.com/spiceai/spiceai/pull/3088
- Update Datafusion crate to include recent unparsing fixes by @sgrebnov in https://github.com/spiceai/spiceai/pull/3089
- Sample SQL table data tool and API by @Jeadie in https://github.com/spiceai/spiceai/pull/3081
- chore: Update datafusion-table-providers by @peasee in https://github.com/spiceai/spiceai/pull/3090
- Add `hive_infer_partitions` to remaining object store connectors by @phillipleblanc in https://github.com/spiceai/spiceai/pull/3086
- deps: Update datafusion-table-providers by @peasee in https://github.com/spiceai/spiceai/pull/3093
- For local embedding models, return usage input tokens. by @Jeadie in https://github.com/spiceai/spiceai/pull/3095
- Update end_game.md with Accelerator/Connector criteria check by @slyons in https://github.com/spiceai/spiceai/pull/3092
- Update TPC-DS Q90 by @sgrebnov in https://github.com/spiceai/spiceai/pull/3094
- docs: Add RC connector criteria by @peasee in https://github.com/spiceai/spiceai/pull/3026
- Update version to 0.19.1-beta by @sgrebnov in https://github.com/spiceai/spiceai/pull/3101

**Full Changelog**: https://github.com/spiceai/spiceai/compare/v0.19.0-beta...v0.19.1-beta

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Discord or by email to get involved.

Twitter: @spice_ai
Discord: https://discord.gg/kZnTfneP5u
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: hey@spice.ai

Spice.ai OSS: A Portable Data, AI, and Retrieval Engine​

Data-grounded AI​

Spice.ai OSS: The engine that makes AI work​

Key features of Spice.ai OSS include:​

Real-world use-cases​

CDN for Databases — Twilio​

Datalake Accelerator — Barracuda​

Data-Grounded AI apps and agents — NRC Health​

Data-Grounded AI Software Development — Spice.ai GitHub Copilot Extension​

Why choose Spice.ai OSS?​

Get started in 30 seconds​

Looking ahead​

Learn more​

Highlights in v1.0-rc.3​

Breaking changes​

Cookbook​

Dependencies​

Contributors​

What's Changed​

Resources​

Community​

Highlights in v0.19.3​

Breaking changes​

Contributors​

New Contributors​

What's Changed​

Resources​

Community​

Highlights in v0.19.2​

Breaking changes​

Contributors​

Dependencies​

What's Changed​

Resources​

Community​

Highlights in v0.19.1​

Breaking changes​

Contributors​

Dependencies​

What's Changed​

Resources​

Community​

Spice.ai OSS: A Portable Data, AI, and Retrieval Engine

Data-grounded AI

Spice.ai OSS: The engine that makes AI work

Key features of Spice.ai OSS include:

Real-world use-cases

CDN for Databases — Twilio

Datalake Accelerator — Barracuda

Data-Grounded AI apps and agents — NRC Health

Data-Grounded AI Software Development — Spice.ai GitHub Copilot Extension

Why choose Spice.ai OSS?

Get started in 30 seconds

Looking ahead

Learn more

Highlights in v1.0-rc.3

Breaking changes

Cookbook

Dependencies

Contributors

What's Changed

Resources

Community

Highlights in v0.19.3

Breaking changes

Contributors

New Contributors

What's Changed

Resources

Community

Highlights in v0.19.2

Breaking changes

Contributors

Dependencies

What's Changed

Resources

Community

Highlights in v0.19.1

Breaking changes

Contributors

Dependencies

What's Changed

Resources

Community