Documentation Index
Fetch the complete documentation index at: https://mintlify.com/jxnl/kura/llms.txt
Use this file to discover all available pages before exploring further.
What is Kura?
Kura is a conversation analysis system built on the same principles as Anthropic’s CLIO framework. It transforms raw conversations into organized, searchable clusters that reveal patterns in how users interact with AI systems.The Pipeline Flow
Kura processes conversations through a multi-stage pipeline:Stage 1: Conversation Loading
Conversations are loaded from various sources (HuggingFace datasets, Claude exports, or custom formats). Each conversation contains:- chat_id: Unique identifier
- messages: List of user/assistant exchanges
- created_at: Timestamp
- metadata: Optional custom properties
Stage 2: Summarization
Each conversation is analyzed by an LLM to extract:- Summary (1-2 sentences)
- User request and task description
- Languages used (human and programming)
- Concerning score (1-5 safety rating)
- User frustration level (1-5)
- Assistant errors
Stage 3: Embedding
Summaries are converted to high-dimensional vectors (embeddings) that capture semantic meaning. This enables mathematical comparison of conversation similarity. Supported embedding models:- OpenAI (
text-embedding-3-small) - Sentence Transformers (local models)
- Cohere (
embed-v4.0)
Stage 4: Base Clustering
Embeddings are grouped using clustering algorithms:- K-means: Partition conversations into N clusters
- HDBSCAN: Density-based clustering (finds natural groupings)
- Positive examples (conversations in the cluster)
- Contrastive examples (conversations from other clusters)
Stage 5: Meta-Clustering
Base clusters are recursively combined into a hierarchy:- Generate candidate parent cluster names
- Assign each cluster to a parent
- Generate descriptions for parent clusters
- Repeat until reaching the target number of root clusters
Stage 6: Dimensionality Reduction
Clusters are projected from high-dimensional embedding space to 2D coordinates using UMAP (Uniform Manifold Approximation and Projection). This enables visualization while preserving the relationships between clusters. See Dimensionality Reduction for details.Modular Architecture
Kura is designed with modularity in mind. Each stage uses abstract base classes:BaseSummaryModel- Implement custom summarization logicBaseEmbeddingModel- Integrate any embedding providerBaseClusteringMethod- Plug in different clustering algorithmsBaseMetaClusterModel- Customize hierarchical organizationBaseDimensionalityReduction- Use alternative projection methods
All components are swappable through dependency injection, allowing you to customize any part of the pipeline without modifying core code.
Checkpointing for Scale
Kura automatically saves intermediate results at each pipeline stage. This provides:- Resume capability: Re-run analysis without repeating expensive steps
- Iterative refinement: Adjust later stages without re-processing earlier ones
- Multiple formats: JSONL, Parquet, HuggingFace Datasets, SQL
Two API Approaches
Kura offers two ways to use the pipeline:Functional API (Recommended)
Compose pipeline stages as pure functions:Class-Based API
Orchestrate through a singleKura class (legacy approach).
Next Steps
Conversations
Learn about the conversation data model and loading methods
Summarization
Understand how conversations are analyzed by LLMs
Embedding
Convert text to semantic vectors
Clustering
Group similar conversations together