About the author: I'm Charles Sieg, a cloud architect and platform engineer who builds apps, services, and infrastructure for Fortune 1000 clients through Vantalect. If your organization is rethinking its software strategy in the age of AI-assisted engineering, let's talk.
AWS OpenSearch Service runs behind more production workloads than most engineers realize: log analytics, full-text search, security event monitoring, vector similarity search. Lots of teams deploy it. Few really understand what's happening underneath. I've designed and operated OpenSearch clusters for years now, everything from small dev setups to multi-petabyte production deployments ingesting billions of documents a day. My first cluster went red within 48 hours. I've learned a lot since then.
This is an architecture reference. If you need to understand what OpenSearch actually does under the hood, scale it without torching your budget, or dodge the failure modes that've dragged me out of bed at 3 AM, keep reading.
The Elasticsearch Fork: Context That Matters
OpenSearch is Amazon's fork of Elasticsearch 7.10.2 and Kibana 7.10.2. It exists because Elastic switched from Apache 2.0 to the dual SSPL/Elastic License in 2021. AWS and a growing open-source community maintain the fork under Apache 2.0.
Why care? Because OpenSearch has diverged from Elasticsearch in ways that matter. Segment replication, searchable snapshots, the Security Analytics plugin, neural search: all developed independently. If you're evaluating OpenSearch based on Elasticsearch knowledge from 2020, your mental model is stale. I tried applying Elasticsearch 6.x tuning patterns to a fresh OpenSearch 2.x cluster once. Got worse performance than the defaults. Lesson learned.
The managed service adds operational layers on top of the open-source project: automated blue/green deployments, IAM and Cognito auth, UltraWarm and cold storage tiers, serverless collections. Knowing where the open-source project ends and the managed service begins affects every architecture decision you'll make.
Cluster Architecture
An OpenSearch cluster is a collection of nodes. Each one runs on an EC2 instance (or, in serverless mode, the infrastructure is entirely abstracted). Node roles are where you start when designing a cluster. Get them wrong and you'll pay for it in cost, performance, or resilience. Usually all three.
Node Types
| Node Type | Role | When to Use |
|---|---|---|
| Data nodes | Store index shards, execute search and indexing operations | Always: every cluster needs data nodes |
| Dedicated master nodes | Manage cluster state, shard allocation, index creation | Any cluster with more than a few nodes or production workloads |
| Coordinator nodes | Route requests, merge partial results, reduce load on data nodes | High query throughput clusters where data nodes are CPU-bound |
| UltraWarm nodes | Store read-only warm data on S3-backed storage | Log analytics and time-series workloads with data retention requirements |
| Cold storage | Archive data to S3, queryable on demand | Compliance and audit data that is rarely accessed |
flowchart TD Client[Client Request] --> COORD[Coordinator Node
Routes requests
Merges results] COORD --> D1[Data Node 1
Hot Tier] COORD --> D2[Data Node 2
Hot Tier] COORD --> UW[UltraWarm Node
S3-backed + SSD cache] MASTER[Dedicated Master Nodes x3
Cluster state
Shard allocation] -.->|Manages| D1 MASTER -.->|Manages| D2 MASTER -.->|Manages| UW COLD[Cold Storage
S3 only] -.->|Attach on demand| UW style MASTER fill:#e94,stroke:#333 style COORD fill:#38b,stroke:#333 style D1 fill:#4a9,stroke:#333 style D2 fill:#4a9,stroke:#333 style UW fill:#ea3,stroke:#333 style COLD fill:#88b,stroke:#333
Dedicated Master Nodes
Cluster state (the data structure tracking every index, shard, mapping, and node) lives exclusively on master-eligible nodes. Three dedicated master nodes. Three Availability Zones. Every production cluster. No exceptions. You get quorum-based leader election that tolerates a single AZ failure.
I've watched teams skip dedicated masters to save a few bucks on dev clusters, then carry that pattern straight into production. Works fine until it doesn't. One node restart during a shard rebalance and suddenly your data nodes are fighting over master election while also trying to serve traffic. Ugly.
Under-sizing master nodes is another common trap. They don't handle search or indexing traffic, but they hold the entire cluster state in memory. Thousands of shards? You'll want r6g.large.search or r6g.xlarge.search. Smaller clusters do fine with m6g.large.search. The key: master node sizing scales with metadata volume (shard count, field count, index count), not data volume.
How Nodes Interact
Any node that receives a search request becomes the coordinator for that request. It figures out which shards hold relevant data, fans the query out to those shards (query phase), collects partial results, merges them, and returns the final answer (fetch phase). This scatter-gather pattern is the heartbeat of OpenSearch performance. A query touching 500 shards costs 500x the coordination overhead of a single-shard query. I once watched a single dashboard panel bring a cluster to its knees. The culprit? An index with 2,000 shards.
Indexing works differently. The coordinator routes each document to the appropriate primary shard using a hash of the document ID. The primary indexes it and replicates to replica shards. Segment replication (OpenSearch-specific) copies entire Lucene segments to replicas instead of replaying individual operations. On write-heavy workloads, this cuts replication overhead significantly. I measured a 40% reduction in replica CPU utilization after enabling it on a high-ingest logging cluster.
sequenceDiagram
participant C as Client
participant CO as Coordinator Node
participant S1 as Shard 1
participant S2 as Shard 2
participant S3 as Shard 3
C->>CO: Search request
rect rgb(200,220,255)
Note over CO,S3: Query Phase
par Fan out query
CO->>S1: Query
CO->>S2: Query
CO->>S3: Query
end
S1-->>CO: Top-N doc IDs + scores
S2-->>CO: Top-N doc IDs + scores
S3-->>CO: Top-N doc IDs + scores
CO->>CO: Merge & rank globally
end
rect rgb(200,255,220)
Note over CO,S3: Fetch Phase
CO->>S1: Fetch doc bodies
CO->>S3: Fetch doc bodies
S1-->>CO: Documents
S3-->>CO: Documents
end
CO-->>C: Final results Storage Tiers
Data tiering is probably the single highest-leverage decision you'll make in an OpenSearch deployment. Nail it and you save 80% on storage costs. Mess it up and you're either overpaying for months or losing query performance right when it matters. Three storage tiers, each with fundamentally different cost and performance profiles.
Hot Tier
Hot storage lives on instance-attached EBS volumes (gp3 or io2 for high-IOPS workloads). Actively written, frequently queried data goes here. Most expensive per-GB, but lowest latency and highest throughput.
EBS volume guidance:
- gp3 is the correct choice for most workloads. It provides a baseline of 3,000 IOPS and 125 MB/s throughput, with the ability to provision up to 16,000 IOPS and 1,000 MB/s independently of volume size.
- io2 is warranted only for latency-sensitive search workloads that need consistent sub-millisecond response times at scale.
UltraWarm Tier
UltraWarm nodes use S3-backed storage with a local SSD cache. Roughly 80% cheaper than hot storage. The catch: data in UltraWarm is read-only. You can't index new documents into warm indices. Warm queries typically run 2-5x slower than equivalent hot queries, depending on cache hit rates.
The OR1 instance type (OpenSearch-optimized, writable warm storage) changed this equation. OR1 lets you index directly into warm storage, bypassing the hot tier entirely for workloads that can tolerate moderately higher indexing latency. I switched a 2 TB/day log analytics pipeline to OR1 and cut the monthly bill by 60%. Indexing latency went up about 3x, but for logs that nobody touches until something breaks? Irrelevant.
Cold Tier
Cold storage moves everything to S3. No local cache at all. Querying cold data means explicitly attaching the cold index to the cluster, and that takes minutes. But it's 95% cheaper than hot storage. If you've got compliance data sitting in hot storage because someone forgot to set up lifecycle policies, you're just burning money.
Index State Management (ISM) Policies
ISM policies automate the lifecycle of indices across tiers. A typical pattern for log analytics:
| Phase | Age | Action | Storage Cost |
|---|---|---|---|
| Hot | 0-7 days | Active indexing and search | $$$ |
| Force merge | 7 days | Merge to 1 segment per shard, reduce overhead | $$$ |
| Warm | 7-30 days | Migrate to UltraWarm, read-only | $ |
| Cold | 30-365 days | Move to cold storage | ¢ |
| Delete | 365+ days | Delete index | Free |
flowchart LR HOT["Hot Tier
0-7 days
Active indexing
$$$"] --> FM["Force Merge
Day 7
1 segment/shard"] FM --> WARM["UltraWarm
7-30 days
Read-only
$"] WARM --> COLD["Cold Storage
30-365 days
S3 archived
¢"] COLD --> DEL["Delete
365+ days"] style HOT fill:#e74,stroke:#333 style FM fill:#e94,stroke:#333 style WARM fill:#ea3,stroke:#333 style COLD fill:#38b,stroke:#333 style DEL fill:#888,stroke:#333
Always run force-merge before warm migration. UltraWarm performance degrades with high segment counts; merging down to one segment per shard before migration keeps warm query performance consistent. Everybody skips this step once. Then they remember why it exists.
Shard Sizing Guidance
Shard sizing trips up more teams than almost anything else in OpenSearch cluster design. Consequential, and deeply unintuitive.
| Guideline | Recommendation |
|---|---|
| Shard size (hot) | 10-50 GB per shard |
| Shard size (UltraWarm) | Up to 200 GB per shard after force-merge |
| Shards per data node | No more than 25 shards per GB of JVM heap |
| Shards per cluster | Aim for fewer than 25,000 total |
| Replicas | 1 replica for most workloads, 2 for critical search |
Over-sharding. It's the single most common architectural mistake I see in production OpenSearch. Every shard eats memory for metadata, file handles, and thread pool resources. 50,000 tiny shards will choke powerful hardware. That same data in 5,000 properly sized shards? Runs comfortably on half the instances. I inherited a cluster once with 120,000 shards across 30 nodes. The master nodes spent more time managing cluster state than the data nodes spent serving queries. We consolidated down to 8,000 shards and retired 12 nodes.
Search Internals
If you're going to design schemas and write queries that hold up at scale, you need to know what's actually happening when OpenSearch executes a query.
Query Execution: Two Phases
Query phase: The coordinator sends the query to every relevant shard. Each shard runs it against its local Lucene index, scores documents using BM25 (by default), and sends back the top-N document IDs and scores. No document bodies cross the wire in this phase.
Fetch phase: The coordinator picks the globally top-N documents from the merged results and pulls the full document bodies from whichever shards hold them. Only the final result set gets fully materialized. That's what keeps data transfer manageable.
BM25 Scoring
BM25 (Best Matching 25) is the default relevance scoring algorithm. It scores documents on three factors: term frequency (how often the search term appears in a document), inverse document frequency (how rare the term is across all documents), and field length normalization (shorter fields get a boost).
Out of the box, BM25 handles full-text search well. Things get interesting when you need to blend text relevance with other signals: recency, popularity, business rules. That's where function_score queries come in. They let you stack custom scoring logic on top of BM25. I've built functionscore queries combining BM25 with a Gaussian decay on publish date and a fieldvalue_factor on view count. The relevance improvement was immediate and measurable.
Aggregations
Aggregations are where OpenSearch stops being a search engine and becomes an analytics platform. Metrics, histograms, top terms, statistical analysis, all computed at query time across distributed shards.
Three categories worth knowing:
- Metric aggregations compute values like
avg,sum,min,max,cardinality(approximate distinct count using HyperLogLog). - Bucket aggregations group documents into buckets: by term, date range, histogram interval, geographic region, or custom filters.
- Pipeline aggregations operate on the output of other aggregations, enabling calculations like moving averages, derivatives, and cumulative sums.
Aggregations run on doc values, a columnar data structure stored on disk. Fields you aggregate on need doc values enabled (they are by default for most types). The text field type doesn't support doc values. If you need to aggregate on a text field, add a keyword sub-field. I've debugged more than a few "aggregation not supported" errors that traced back to exactly this.
Query DSL Patterns
Here are the Query DSL patterns I reach for constantly in production.
Bool queries do the heavy lifting for complex search. The must, should, must_not, and filter clauses let you compose precise queries. Clauses in filter context don't contribute to relevance scoring and get cached, so always shove non-scoring criteria (date ranges, status filters, access control) into filter. I've seen teams put everything in must and wonder why their queries are slow. Moved three clauses from must to filter on a high-traffic endpoint once. Cut p99 latency by 35%.
Multi-match queries search across multiple fields with configurable strategies: best_fields (default; takes the highest-scoring field), most_fields (combines scores from all matching fields), and cross_fields (treats the field set as one big field; great for name searches across firstname and lastname).
Search templates don't get enough love. Any team with a search-heavy application should use them. They parameterize queries and separate query logic from application code, which means you can tune search relevance without redeploying. That separation alone is worth the small upfront investment.
Scaling Patterns
Horizontal vs. Vertical Scaling
OpenSearch scales in both directions: horizontally (more nodes) and vertically (bigger instance types).
| Scaling Axis | When to Use | Considerations |
|---|---|---|
| Vertical (larger instances) | CPU or memory bottleneck on existing nodes | Simpler, no shard redistribution needed |
| Horizontal (more nodes) | Storage capacity, shard count, or throughput limits | Requires shard rebalancing, more network overhead |
| Add replicas | Read throughput needs to increase | Doubles storage cost per replica added |
| Re-shard (reindex) | Shards are too large or too numerous | Requires reindexing, plan for downtime or use aliases |
I always start vertical. It's operationally simpler and doesn't trigger shard redistribution. Upgrade the instance type, let the blue/green deployment run, done. Once I hit the ceiling of a single instance type (or when one fat instance costs more than several smaller ones), I go horizontal. That crossover point is usually around r6g.4xlarge or r7g.4xlarge.
Multi-AZ with Standby
Multi-AZ with Standby gives you a 99.99% SLA. Active nodes run across two Availability Zones; a full set of standby nodes sits in a third AZ. When an AZ fails, the standby nodes pick up within seconds. No waiting around for traditional shard reallocation.
Requirements:
- Three Availability Zones
- Even number of data nodes (minimum 2 per AZ, so 6 total)
- Dedicated master nodes (3, one per AZ)
- Two replicas for every primary shard
You'll pay roughly 50% more than a two-AZ deployment for that standby fleet. Worth it? Depends. Going from 99.9% to 99.99% availability and from minutes to seconds on recovery makes sense for any workload where search downtime directly hits revenue. I deployed this for an e-commerce search platform after a 12-minute outage during a single-AZ failure cost the business six figures. The 50% infrastructure bump was nothing by comparison.
Cross-Cluster Replication
Cross-cluster replication (CCR) creates a follower index on a remote cluster that stays in sync with a leader index automatically. A few places where it shines:
- Geographic locality: Replicate data to a cluster in the user's region. Lower latency.
- Workload isolation: Offload analytics queries to a follower cluster so they don't compete with production search.
- Disaster recovery: Keep a standby cluster in a separate region.
CCR replicates at the shard operation level. Under normal conditions, the follower stays within seconds of the leader. Follower indices are read-only; all writes go to the leader.
Networking and Security
VPC Deployment
Every production workload goes in a VPC. Full stop. VPC deployment puts the OpenSearch endpoints inside your private subnets, reachable only through your VPC's network:
- No public internet exposure
- Network-level access control via security groups
- VPC Flow Logs for network audit trails
- AWS PrivateLink for cross-account access
The trade-off? VPC endpoints aren't publicly accessible, so OpenSearch Dashboards needs a VPN, bastion host, or AWS Verified Access. I put an Application Load Balancer with Cognito auth in front of the Dashboards endpoint. Gives every team member SSO access, nothing exposed to the public internet.
Fine-Grained Access Control (FGAC)
FGAC is powerful and widely misunderstood. It provides authorization at three levels:
| Level | What It Controls | Example |
|---|---|---|
| Index-level | Which indices a role can read/write | Allow analytics-team to read logs-* but not security-* |
| Document-level | Which documents within an index a role can see | Only show documents where department matches the user's department |
| Field-level | Which fields within a document a role can see | Hide ssn and salary fields from the general-reader role |
FGAC can use an internal user database, or it can map to IAM principals and SAML identities. For AWS-native architectures, I map IAM roles to OpenSearch backend roles. Same IAM policies governing your other AWS services, now governing OpenSearch too. For enterprise deployments with existing identity providers, SAML federation through Cognito gives you single sign-on. I've set up both. IAM mapping is cleaner for greenfield projects. SAML federation is unavoidable when the enterprise already has an IdP that everyone expects to use.
Encryption
Three layers of encryption:
- At rest: AES-256 via AWS KMS keys (service-managed or customer-managed). Covers all data files, indices, logs, swap files, and automated snapshots.
- In transit: TLS 1.2 for all node-to-node and client-to-node communication. Enforced by default on new domains.
- Field-level: Application-level encryption of sensitive fields before indexing, using your own key management. Protects data even from cluster administrators.
OpenSearch Serverless
OpenSearch Serverless is a completely different animal. No clusters, no nodes, no capacity planning. You create collections; AWS handles all infrastructure.
Collection Types
| Type | Optimized For | Use Case |
|---|---|---|
| Search | Full-text search with low latency | Application search, e-commerce, content discovery |
| Time series | Append-only log and event data | Log analytics, metrics, observability |
| Vector search | k-NN similarity search | Semantic search, RAG, recommendation engines |
OpenSearch Compute Units (OCUs)
Pricing runs on OCUs: abstracted compute units bundling vCPU, memory, and storage I/O. Indexing and search get separate OCU types that scale independently based on demand.
| Configuration | Minimum | Scales To |
|---|---|---|
| Indexing OCUs | 2 OCUs (can be set to 0 during inactivity) | Hundreds, based on throughput |
| Search OCUs | 2 OCUs (always on for active collections) | Hundreds, based on query load |
| OCU cost | ~$0.24/OCU-hour | Per-second billing |
Collection Groups
Collection groups let you share OCU capacity across multiple collections. Without them, each collection carries its own minimum OCU overhead. With them, you can pack dozens of small collections into a shared capacity pool. I use collection groups heavily in dev environments where each microservice has its own collection but none generate enough traffic to justify dedicated OCUs.
When to Use Serverless vs. Provisioned
| Factor | Use Serverless | Use Provisioned |
|---|---|---|
| Traffic pattern | Spiky, unpredictable | Steady, predictable |
| Operational overhead tolerance | Low: want fully managed | Willing to manage capacity |
| Fine-grained tuning needed | No: defaults are acceptable | Yes: need custom shard counts, instance types, plugins |
| Cost at scale | More expensive above ~8 OCUs sustained | More cost-effective at steady-state |
| Feature coverage | Subset of OpenSearch features | Full feature set |
My default: start on serverless for every new project. I only move to provisioned when I hit a specific wall (cross-cluster replication, custom plugins, SAML, ISM policies, fine-grained shard control). If serverless costs outgrow the equivalent provisioned cost, migrate. Compatible data model, so the migration path is straightforward. I've done this transition twice. Both times took less than a day.
Ingestion Pipelines
Getting data into OpenSearch efficiently matters just as much as the cluster architecture. AWS provides several managed ingestion paths, and picking the right one depends on your workload.
Amazon OpenSearch Ingestion (OSI)
OSI is a fully managed, serverless ingestion pipeline built on Data Prepper (the open-source data collector from the OpenSearch project). It handles:
- Log ingestion: Parse, transform, enrich, and route logs from S3, Kafka, HTTP endpoints.
- Trace analytics: Ingest OpenTelemetry traces for distributed tracing.
- Metric analytics: Collect and aggregate metrics data.
Pipelines are defined in YAML and scale automatically. Zero operational overhead compared to self-managed Data Prepper: no EC2 instances, no auto-scaling groups. I ran self-managed Data Prepper for about six months before switching. The instances needed patching, monitoring, scaling. OSI killed all of that overhead.
Amazon Kinesis Data Firehose
Firehose is the simplest way to stream data into OpenSearch. Buffering, batching, compression, retry logic: all handled. For log analytics workloads arriving from CloudWatch Logs, Kinesis Data Streams, or direct PUT via the Firehose API, it's often the right call.
You can transform records via Lambda functions before they reach OpenSearch, and Firehose supports backup to S3. Always enable S3 backup. Always. That backup gives you replay capability when you inevitably need to reindex after a mapping change. I've used it three times in the last two years. Saved days of work each time.
Fluent Bit
For container workloads on ECS or EKS, Fluent Bit is the standard log collector. AWS provides an optimized image with the OpenSearch output plugin baked in. Runs as a sidecar or DaemonSet, grabs stdout/stderr from application containers, ships logs to OpenSearch. Straightforward.
On high-volume EKS deployments, though, I route Fluent Bit output through Kinesis Data Firehose instead of sending it directly to OpenSearch. That buffer layer absorbs traffic spikes and keeps the cluster from drowning during log surges from deployment rollouts or error storms. I learned this one the hard way during a deployment that triggered a cascade of error logs. Without Firehose in front, the spike would've saturated the bulk indexing queue and caused data loss.
Lambda Patterns
Lambda works well for event-driven ingestion. S3 object creation triggers indexing, DynamoDB Streams replicate changes to OpenSearch for search, custom apps transform data before indexing.
The thing to get right with Lambda-to-OpenSearch: connection management. Use the OpenSearch client with connection pooling and reuse connections across invocations. Creating a new HTTPS connection per invocation adds real latency and exhausts the cluster's connection limits under high concurrency. I put connection reuse in the global scope of every Lambda function that talks to OpenSearch. The numbers speak for themselves: 200ms for a cold connection, 15ms for a reused one.
Vector Search and AI
OpenSearch isn't just a text search engine anymore. It's become a legitimate vector database powering modern AI applications. This is the fastest-evolving area of the platform, and capabilities have jumped substantially with every recent release.
k-NN Search
OpenSearch supports k-nearest neighbor (k-NN) search with multiple algorithms and engines:
| Engine | Algorithm | Best For |
|---|---|---|
| nmslib | HNSW (Hierarchical Navigable Small World) | High recall, moderate scale |
| Faiss | HNSW, IVF | Large-scale vector search, GPU acceleration |
| Lucene | HNSW | Combining vector and text search in hybrid queries |
Faiss with HNSW is my default recommendation for most workloads. Good recall-latency trade-offs, and it supports quantization (reducing vector precision to save memory) without significant recall loss.
Vector Dimensions and Memory
Vector search eats memory. Each vector consumes 4 bytes x dimensions for float32, or 1 byte x dimensions for byte quantization. For a typical embedding model producing 1536-dimensional vectors (like text-embedding-ada-002):
- Float32: 6 KB per vector, 6 GB per million vectors (vectors only, excluding metadata)
- Byte quantized: 1.5 KB per vector, 1.5 GB per million vectors
At scale, memory becomes the primary constraint. Full stop. Use scalar quantization for large collections where a small recall trade-off is acceptable. Make sure your data nodes have enough JVM heap and native memory for the HNSW graphs. I've sized clusters for 50 million vectors, and the memory math dominated every other capacity planning consideration.
Semantic Search
Semantic search uses embedding models to find documents that are conceptually similar to a query, even when they share zero keywords. OpenSearch supports this through neural search, which hooks into ML models on SageMaker or Amazon Bedrock.
The pattern:
- At index time, an ingest pipeline calls the embedding model to convert text fields into vector representations. These get stored alongside the original text.
- At search time, the query text also gets converted to a vector, and k-NN search finds the nearest document vectors.
- Hybrid search blends BM25 text scoring with k-NN vector scoring using reciprocal rank fusion or linear combination. You get both keyword precision and semantic recall.
RAG with Amazon Bedrock
Retrieval-Augmented Generation (RAG) is how you ground large language models in domain-specific knowledge. OpenSearch serves as the retrieval layer; Bedrock handles generation.
Here's the flow:
- Documents get chunked, embedded, and indexed in OpenSearch with vector and text fields.
- A user query is embedded and used for hybrid search (k-NN + BM25) against OpenSearch.
- Top-K results are passed as context to a Bedrock foundation model (Claude, for example).
- The model generates a response grounded in the retrieved documents.
Bedrock Knowledge Bases automates this entire pattern: document chunking, embedding, indexing into OpenSearch Serverless, retrieval. For custom implementations where you need more control over chunking strategies, retrieval logic, or re-ranking, build the pipeline yourself with OpenSearch and Bedrock APIs. I've done both. Bedrock Knowledge Bases gets you to production in a day. The custom pipeline takes a week but lets you do things like chunk by semantic boundary instead of token count.
Agentic Search
A newer pattern I'm seeing in production: agentic search. An AI agent dynamically constructs OpenSearch queries based on user intent. Instead of mapping user input to a fixed query template, the agent:
- Interprets the user's natural language question.
- Figures out which indices, fields, and query types are relevant.
- Constructs and executes one or more OpenSearch queries.
- Evaluates results and optionally refines the query.
- Synthesizes everything into a coherent answer.
This works well for data exploration where users don't know field names or query syntax. The agent hides Query DSL complexity behind a conversational interface. I've prototyped this with Claude as the agent and OpenSearch as the data store. Results are promising, especially when the agent has access to index mappings so it knows what fields exist and their types.
Operational Patterns
Blue/Green Deployments
AWS OpenSearch Service handles configuration changes (instance type changes, version upgrades, scaling) through blue/green deployments. New nodes (green) get provisioned, data replicates from existing nodes (blue), traffic redirects, blue nodes get terminated.
Mostly transparent, but plan for these:
- Duration: Minutes to hours depending on data volume. A 10 TB cluster takes way longer than a 500 GB cluster.
- During the deployment: The cluster temporarily runs double the nodes, meaning double the cost. Make sure your account has sufficient service limits.
- Avoid chaining changes: Multiple configuration changes in quick succession queue up blue/green deployments and extend total time. Batch your changes. A colleague once made four separate changes in 30 minutes. The cascading blue/green deployments took 14 hours to finish.
CloudWatch Alarms to Set
At minimum, set alarms on these metrics for every production cluster:
| Metric | Alarm Threshold | Why |
|---|---|---|
ClusterStatus.red | >= 1 for 1 minute | At least one primary shard is unallocated: data loss risk |
ClusterStatus.yellow | >= 1 for 5 minutes | At least one replica is unallocated: reduced redundancy |
FreeStorageSpace | < 20% of total | Write blocks trigger at ~5%, but performance degrades earlier |
JVMMemoryPressure | > 80% for 5 minutes | High GC pressure leads to latency spikes and instability |
CPUUtilization | > 80% sustained | Search latency will increase; scale vertically or horizontally |
MasterCPUUtilization | > 50% sustained | Master instability affects the entire cluster |
ThreadpoolWriteQueue | > 100 | Write requests are queuing, indicating backpressure |
ThreadpoolSearchQueue | > 500 | Search requests are queuing, indicating overload |
Common Failure Modes
Red cluster status: One or more primary shards are unallocated. A node failed; the cluster can't find somewhere to host the primary shard. If you've got replicas, OpenSearch promotes one to primary. If the cluster is also low on resources, that promotion fails. Scale up, check storage space, dig into root cause. I've been paged for red cluster status more times than I care to admit. The cause is almost always one of two things: disk full, or a node that ran out of JVM heap during a large merge.
Write blocks: OpenSearch enforces a write block when free storage drops below a threshold (configured by cluster.routing.allocation.disk.watermark.flood_stage, defaulting to 95% full). Once the write block kicks in, no indexing happens until you free storage by deleting indices, adding nodes, or bumping volume size. Set aggressive storage alarms. Use ISM policies to age off data before you hit this wall. Recovering from a write block at 3 AM is miserable.
JVM out of memory: The JVM heap is shared across indexing buffers, query caches, aggregation state, and field data. Oversized aggregations (high-cardinality terms aggregations, for example) can eat the entire heap. Set search.max_buckets to something reasonable, use collect_mode: breadth_first for nested aggregations, and watch JVMMemoryPressure closely. I set max_buckets to 10,000 on every cluster I build. The default of 65,535 is asking for trouble.
Split-brain: Rare with dedicated master nodes. It happens when network partitions prevent master nodes from communicating, and the cluster elects two leaders, creating divergent state. Three dedicated master nodes across three AZs prevents this.
Cost Optimization
Instance Family Selection
| Instance Family | Optimized For | Use Case |
|---|---|---|
| R6g/R7g (Graviton) | Memory | General-purpose data nodes, large working sets |
| M6g/M7g (Graviton) | Balanced | Master nodes, coordinator nodes, moderate workloads |
| C6g/C7g (Graviton) | Compute | Query-heavy workloads with small working sets |
| I3 | Storage I/O | High-throughput indexing with instance storage |
| OR1 | Writable warm | Direct-to-warm indexing for cost-optimized log analytics |
The Graviton Advantage
Graviton-based instances (the g suffix) deliver roughly 30% better price-performance versus equivalent x86 instances for OpenSearch workloads. Unless you've got a specific x86 dependency (extremely rare for managed OpenSearch), always go Graviton.
R7g is my default for data nodes. DDR5 memory, improved per-core performance over R6g, marginal cost increase. I've run benchmarks comparing R6g.2xlarge to R7g.2xlarge on identical workloads. R7g consistently delivered 15-20% better query throughput.
Reserved Instances
For steady-state production workloads, Reserved Instances (1-year or 3-year) cut costs by 30-50% versus On-Demand. The commitment applies to instance type and region (not a specific domain), so you keep flexibility to restructure clusters within the same instance family.
Figure out your baseline: the minimum instance count you'll maintain for the commitment period. Reserve that. Use On-Demand for burst capacity above it. I typically start with 1-year no-upfront for new workloads where I'm still learning the traffic pattern, then switch to 3-year partial-upfront once things stabilize.
Tiered Storage Economics
The economics are straightforward:
| Tier | Approximate Cost per GB-Month | Relative Cost |
|---|---|---|
| Hot (gp3) | $0.10-$0.15 | 100% |
| UltraWarm | $0.024 | ~20% |
| Cold | $0.006 | ~5% |
| S3 (snapshot archive) | $0.004 | ~3% |
For a workload retaining 90 days of log data at 100 GB/day:
- All hot: ~9 TB x $0.12 = ~$1,080/month for storage alone
- 7 days hot, 83 days warm: (700 GB x $0.12) + (8.3 TB x $0.024) = ~$284/month
- 7 days hot, 23 days warm, 60 days cold: (700 GB x $0.12) + (2.3 TB x $0.024) + (6 TB x $0.006) = ~$175/month
84% reduction in storage costs. Same data retention policy. I've presented these numbers to finance teams and watched the reaction. Nobody argues against ISM policies after seeing this math.
Right-Sizing
Over-provisioned OpenSearch clusters are everywhere. Teams provision for peak load and never look back. A right-sizing review should check:
- JVM heap utilization: Consistently below 50%? Instance type is oversized.
- CPU utilization: Consistently below 30%? Smaller instance type or fewer nodes.
- Storage utilization: Free storage above 50% after accounting for blue/green deployment overhead (which temporarily doubles storage needs)? Over-provisioned.
- Shard count per node: Well below the recommended limit? More nodes than you need.
Do this quarterly for production clusters. AWS Cost Explorer and OpenSearch-specific CloudWatch metrics give you everything. I keep a spreadsheet per cluster tracking these metrics month over month. Trends tell you more than any single data point.
OpenSearch vs. Elasticsearch: The Fork Divergence
Several years post-fork, OpenSearch and Elasticsearch have diverged substantially. Here's where things stand if you're evaluating both:
Features OpenSearch Has That Elasticsearch Charges For
| Capability | OpenSearch | Elasticsearch |
|---|---|---|
| Security (RBAC, FGAC, SSO) | Included, open source | Requires paid subscription (Platinum/Enterprise) |
| Alerting | Included | Requires paid subscription |
| Anomaly detection | Included | Requires paid subscription (Machine Learning) |
| Index management (ISM) | Included | Index Lifecycle Management included, but advanced features require paid tier |
| SQL query support | Included (SQL plugin) | Included (basic), advanced requires paid tier |
| Cross-cluster replication | Included | Requires paid subscription (Platinum) |
Features Unique to Each
| OpenSearch Exclusive | Elasticsearch Exclusive |
|---|---|
| Segment replication | ESQL (new query language) |
| Searchable snapshots (managed service) | Elastic Agent/Fleet |
| Neural search and ML Commons | Elastic Security (SIEM) as integrated product |
| OpenSearch Serverless | Elastic Cloud Serverless |
| Data Prepper / OSI | Elastic APM (more mature) |
| OR1 writable warm instances | Frozen tier with shared cache |
Migration Considerations
If you're considering a migration in either direction, here are the constraints:
- API compatibility: OpenSearch maintains compatibility with the Elasticsearch 7.x API surface. Most client libraries work with both, though you should use the dedicated OpenSearch client libraries for new development.
- Index compatibility: Indices created in Elasticsearch 7.x work directly in OpenSearch. Elasticsearch 8.x indices don't.
- Plugin ecosystem: Plugins aren't cross-compatible. Any custom Elasticsearch plugins need OpenSearch equivalents.
Conclusion
OpenSearch Service has grown into a versatile platform spanning full-text search, log analytics, vector similarity search, and AI-powered retrieval. The architectural decisions that matter most (node sizing, shard strategy, storage tiering, security configuration) are exactly the ones that are easiest to get wrong. You need to understand how the system behaves under load, not just what the API surface exposes.
The patterns I rely on in production:
- Dedicated master nodes and Graviton instances from the start. Minimal cost overhead, immediate operational benefit.
- ISM policies from day one. Retrofitting tiered storage onto a cluster that's been running everything in hot tier is painful. I've done it. Don't repeat my mistake.
- Right-size shards aggressively. Over-sharding causes more production incidents than almost any other configuration mistake.
- Multi-AZ with Standby for revenue-impacting workloads. The 50% cost increase pays for itself the first time you survive an AZ failure without customer impact.
- Serverless for new projects. The operational simplicity is worth the trade-offs, and the data model is compatible with provisioned if you outgrow it.
OpenSearch keeps evolving fast. Vector search capabilities have improved substantially over the past year, and each release closes gaps that used to push teams toward dedicated vector databases. If you're building search and analytics infrastructure on AWS, this platform is worth understanding deeply.
Additional Resources
- OpenSearch Documentation
- AWS OpenSearch Service Developer Guide
- OpenSearch Benchmark - Performance testing tool for OpenSearch
- OpenSearch Project on GitHub
- AWS Architecture Blog - OpenSearch Posts
Let's Build Something!
I help teams ship cloud infrastructure that actually works at scale. Whether you're modernizing a legacy platform, designing a multi-region architecture from scratch, or figuring out how AI fits into your engineering workflow, I've seen your problem before. Let me help.
Currently taking on select consulting engagements through Vantalect.

