How It Works

From API key to useful memory in a few controlled steps.

The integration model is intentionally simple for the product team and intentionally strict under the hood: authenticated writes, queued processing, isolated collections, and runtime retrieval over stable routes.

Read API docs Review plans

How It Works

Your first memory-powered flow in under 15 minutes.

Three steps to go from zero to a product that remembers users across sessions.

Create your workspace

Generate an API key from the console and set your project-level auth in minutes.

~2 min setup

Send memories from your app

Ingest user events, preferences, and conversation snippets through one consistent endpoint.

1 API call

Retrieve context on every response

Query by meaning and metadata to feed the right memory back into your prompt pipeline.

< 20ms P95

Already have an account? Go to dashboard ->

API Experience

Developer experience built for speed.

The API is straightforward for the first prototype and stable enough for production traffic.

Simple endpoints

Ingest and retrieve through clean REST routes that are easy to test and maintain.

Fast integration

Drop into existing Node, Python, and serverless backends without rewriting your app architecture.

Production auth

Use scoped API keys for runtime traffic and signed sessions for dashboard access.

memory-ingest.js

javascript

01await fetch("https://api.neuralbase.cloud/v1/memories", {
02method: "POST",
03headers: { Authorization: "Bearer nb_live_..." },
04body: JSON.stringify({ userId, content, metadata })
05});

retrieval.py

python

01import requests
02result
03= requests.get(
04"https://api.neuralbase.cloud/v1/memories",
05params={"query": "refund policy preference"}
06)
07# result["items"][0]["score"] -> 0.94

Runtime flow

What happens after your app sends the request.

Neuralbase is not just a thin proxy to a vector database. The platform adds ingestion control, quota enforcement, document processing, and isolation before anything is written or returned.

Request lifecycle
Stage	What your app does	What Neuralbase handles
1. Write	Your backend sends memories or documents to Neuralbase over authenticated API routes.	Requests are attached to a workspace, rate limited, and counted against plan quotas immediately after auth.
2. Process	Content is normalized, chunked, parsed, or queued for asynchronous processing depending on the route.	BullMQ and Redis handle background work, while document extraction and status updates remain visible in the dashboard.
3. Index	Embeddings are generated and stored in a per-user Qdrant collection rather than a shared tenant bucket.	This keeps the retrieval boundary aligned with the account the platform is billing and protecting.
4. Retrieve	Your app calls list or search routes when it needs context for a response or workflow decision.	Neuralbase returns the most relevant memory and metadata while keeping vector internals hidden behind the API.

Architecture

Keep control of your memory infrastructure.

Neuralbase works the way serious teams prefer: a public API layer in front, a private vector store behind it, and nothing exposed that should not be.

neuralbase — neuralbase.config.ts

TSneuralbase.config.ts×

1import { defineConfig } from "@neuralbase/sdk";
2
3export default defineConfig({
api: {                        // public layer
  endpoint:    "https://api.neuralbase.cloud",
  rateLimit:   1000,            // req/min
},
vectorStore: {                   // private layer
  region:     "us-east-1",
  namespace:  process.env.TENANT_NS,
},
embeddings: {                   // managed layer
  model:      "text-embedding-3-small",
  dimensions: 1536,
},
16});

⎇ main0 errors

TypeScriptUTF-8LF

Public API layer

Expose only your API domain to clients. Internal services stay protected with no direct vector access from the outside.

Private vector runtime

Run your vector layer on the same infrastructure as your backend for lower latency and tighter data control.

Managed embedding layer

Use Neuralbase's managed embedding layer now and keep room to tune retrieval as traffic scales.