Skip to main content

AWS OpenSearch Service: An Architecture Deep-Dive

AWSOpenSearchArchitectureSearch

About the author: I'm Charles Sieg, a cloud architect and platform engineer who builds apps, services, and infrastructure for Fortune 1000 clients through Vantalect. If your organization is rethinking its software strategy in the age of AI-assisted engineering, let's talk.

AWS OpenSearch Service runs behind more production workloads than most engineers realize: log analytics, full-text search, security event monitoring, vector similarity search. Lots of teams deploy it. Few really understand what's happening underneath. I've designed and operated OpenSearch clusters for years now, everything from small dev setups to multi-petabyte production deployments ingesting billions of documents a day. My first cluster went red within 48 hours. I've learned a lot since then.

This is an architecture reference. If you need to understand what OpenSearch actually does under the hood, scale it without torching your budget, or dodge the failure modes that've dragged me out of bed at 3 AM, keep reading.

The Elasticsearch Fork: Context That Matters

OpenSearch is Amazon's fork of Elasticsearch 7.10.2 and Kibana 7.10.2. It exists because Elastic switched from Apache 2.0 to the dual SSPL/Elastic License in 2021. AWS and a growing open-source community maintain the fork under Apache 2.0.

Why care? Because OpenSearch has diverged from Elasticsearch in ways that matter. Segment replication, searchable snapshots, the Security Analytics plugin, neural search: all developed independently. If you're evaluating OpenSearch based on Elasticsearch knowledge from 2020, your mental model is stale. I tried applying Elasticsearch 6.x tuning patterns to a fresh OpenSearch 2.x cluster once. Got worse performance than the defaults. Lesson learned.

The managed service adds operational layers on top of the open-source project: automated blue/green deployments, IAM and Cognito auth, UltraWarm and cold storage tiers, serverless collections. Knowing where the open-source project ends and the managed service begins affects every architecture decision you'll make.

Cluster Architecture

An OpenSearch cluster is a collection of nodes. Each one runs on an EC2 instance (or, in serverless mode, the infrastructure is entirely abstracted). Node roles are where you start when designing a cluster. Get them wrong and you'll pay for it in cost, performance, or resilience. Usually all three.

Node Types

Node TypeRoleWhen to Use
Data nodesStore index shards, execute search and indexing operationsAlways: every cluster needs data nodes
Dedicated master nodesManage cluster state, shard allocation, index creationAny cluster with more than a few nodes or production workloads
Coordinator nodesRoute requests, merge partial results, reduce load on data nodesHigh query throughput clusters where data nodes are CPU-bound
UltraWarm nodesStore read-only warm data on S3-backed storageLog analytics and time-series workloads with data retention requirements
Cold storageArchive data to S3, queryable on demandCompliance and audit data that is rarely accessed
flowchart TD
  Client[Client Request] --> COORD[Coordinator Node
Routes requests
Merges results] COORD --> D1[Data Node 1
Hot Tier] COORD --> D2[Data Node 2
Hot Tier] COORD --> UW[UltraWarm Node
S3-backed + SSD cache] MASTER[Dedicated Master Nodes x3
Cluster state
Shard allocation] -.->|Manages| D1 MASTER -.->|Manages| D2 MASTER -.->|Manages| UW COLD[Cold Storage
S3 only] -.->|Attach on demand| UW style MASTER fill:#e94,stroke:#333 style COORD fill:#38b,stroke:#333 style D1 fill:#4a9,stroke:#333 style D2 fill:#4a9,stroke:#333 style UW fill:#ea3,stroke:#333 style COLD fill:#88b,stroke:#333
OpenSearch cluster node roles

Dedicated Master Nodes

Cluster state (the data structure tracking every index, shard, mapping, and node) lives exclusively on master-eligible nodes. Three dedicated master nodes. Three Availability Zones. Every production cluster. No exceptions. You get quorum-based leader election that tolerates a single AZ failure.

I've watched teams skip dedicated masters to save a few bucks on dev clusters, then carry that pattern straight into production. Works fine until it doesn't. One node restart during a shard rebalance and suddenly your data nodes are fighting over master election while also trying to serve traffic. Ugly.

Under-sizing master nodes is another common trap. They don't handle search or indexing traffic, but they hold the entire cluster state in memory. Thousands of shards? You'll want r6g.large.search or r6g.xlarge.search. Smaller clusters do fine with m6g.large.search. The key: master node sizing scales with metadata volume (shard count, field count, index count), not data volume.

How Nodes Interact

Any node that receives a search request becomes the coordinator for that request. It figures out which shards hold relevant data, fans the query out to those shards (query phase), collects partial results, merges them, and returns the final answer (fetch phase). This scatter-gather pattern is the heartbeat of OpenSearch performance. A query touching 500 shards costs 500x the coordination overhead of a single-shard query. I once watched a single dashboard panel bring a cluster to its knees. The culprit? An index with 2,000 shards.

Indexing works differently. The coordinator routes each document to the appropriate primary shard using a hash of the document ID. The primary indexes it and replicates to replica shards. Segment replication (OpenSearch-specific) copies entire Lucene segments to replicas instead of replaying individual operations. On write-heavy workloads, this cuts replication overhead significantly. I measured a 40% reduction in replica CPU utilization after enabling it on a high-ingest logging cluster.

sequenceDiagram
  participant C as Client
  participant CO as Coordinator Node
  participant S1 as Shard 1
  participant S2 as Shard 2
  participant S3 as Shard 3

  C->>CO: Search request

  rect rgb(200,220,255)
  Note over CO,S3: Query Phase
  par Fan out query
    CO->>S1: Query
    CO->>S2: Query
    CO->>S3: Query
  end
  S1-->>CO: Top-N doc IDs + scores
  S2-->>CO: Top-N doc IDs + scores
  S3-->>CO: Top-N doc IDs + scores
  CO->>CO: Merge & rank globally
  end

  rect rgb(200,255,220)
  Note over CO,S3: Fetch Phase
  CO->>S1: Fetch doc bodies
  CO->>S3: Fetch doc bodies
  S1-->>CO: Documents
  S3-->>CO: Documents
  end

  CO-->>C: Final results
OpenSearch query execution: scatter-gather model

Storage Tiers

Data tiering is probably the single highest-leverage decision you'll make in an OpenSearch deployment. Nail it and you save 80% on storage costs. Mess it up and you're either overpaying for months or losing query performance right when it matters. Three storage tiers, each with fundamentally different cost and performance profiles.

Hot Tier

Hot storage lives on instance-attached EBS volumes (gp3 or io2 for high-IOPS workloads). Actively written, frequently queried data goes here. Most expensive per-GB, but lowest latency and highest throughput.

EBS volume guidance:

  • gp3 is the correct choice for most workloads. It provides a baseline of 3,000 IOPS and 125 MB/s throughput, with the ability to provision up to 16,000 IOPS and 1,000 MB/s independently of volume size.
  • io2 is warranted only for latency-sensitive search workloads that need consistent sub-millisecond response times at scale.

UltraWarm Tier

UltraWarm nodes use S3-backed storage with a local SSD cache. Roughly 80% cheaper than hot storage. The catch: data in UltraWarm is read-only. You can't index new documents into warm indices. Warm queries typically run 2-5x slower than equivalent hot queries, depending on cache hit rates.

The OR1 instance type (OpenSearch-optimized, writable warm storage) changed this equation. OR1 lets you index directly into warm storage, bypassing the hot tier entirely for workloads that can tolerate moderately higher indexing latency. I switched a 2 TB/day log analytics pipeline to OR1 and cut the monthly bill by 60%. Indexing latency went up about 3x, but for logs that nobody touches until something breaks? Irrelevant.

Cold Tier

Cold storage moves everything to S3. No local cache at all. Querying cold data means explicitly attaching the cold index to the cluster, and that takes minutes. But it's 95% cheaper than hot storage. If you've got compliance data sitting in hot storage because someone forgot to set up lifecycle policies, you're just burning money.

Index State Management (ISM) Policies

ISM policies automate the lifecycle of indices across tiers. A typical pattern for log analytics:

PhaseAgeActionStorage Cost
Hot0-7 daysActive indexing and search$$$
Force merge7 daysMerge to 1 segment per shard, reduce overhead$$$
Warm7-30 daysMigrate to UltraWarm, read-only$
Cold30-365 daysMove to cold storage¢
Delete365+ daysDelete indexFree
flowchart LR
  HOT["Hot Tier
0-7 days
Active indexing
$$$"] --> FM["Force Merge
Day 7
1 segment/shard"] FM --> WARM["UltraWarm
7-30 days
Read-only
$"] WARM --> COLD["Cold Storage
30-365 days
S3 archived
¢"] COLD --> DEL["Delete
365+ days"] style HOT fill:#e74,stroke:#333 style FM fill:#e94,stroke:#333 style WARM fill:#ea3,stroke:#333 style COLD fill:#38b,stroke:#333 style DEL fill:#888,stroke:#333
Index State Management lifecycle across storage tiers

Always run force-merge before warm migration. UltraWarm performance degrades with high segment counts; merging down to one segment per shard before migration keeps warm query performance consistent. Everybody skips this step once. Then they remember why it exists.

Shard Sizing Guidance

Shard sizing trips up more teams than almost anything else in OpenSearch cluster design. Consequential, and deeply unintuitive.

GuidelineRecommendation
Shard size (hot)10-50 GB per shard
Shard size (UltraWarm)Up to 200 GB per shard after force-merge
Shards per data nodeNo more than 25 shards per GB of JVM heap
Shards per clusterAim for fewer than 25,000 total
Replicas1 replica for most workloads, 2 for critical search

Over-sharding. It's the single most common architectural mistake I see in production OpenSearch. Every shard eats memory for metadata, file handles, and thread pool resources. 50,000 tiny shards will choke powerful hardware. That same data in 5,000 properly sized shards? Runs comfortably on half the instances. I inherited a cluster once with 120,000 shards across 30 nodes. The master nodes spent more time managing cluster state than the data nodes spent serving queries. We consolidated down to 8,000 shards and retired 12 nodes.

Search Internals

If you're going to design schemas and write queries that hold up at scale, you need to know what's actually happening when OpenSearch executes a query.

Query Execution: Two Phases

Query phase: The coordinator sends the query to every relevant shard. Each shard runs it against its local Lucene index, scores documents using BM25 (by default), and sends back the top-N document IDs and scores. No document bodies cross the wire in this phase.

Fetch phase: The coordinator picks the globally top-N documents from the merged results and pulls the full document bodies from whichever shards hold them. Only the final result set gets fully materialized. That's what keeps data transfer manageable.

BM25 Scoring

BM25 (Best Matching 25) is the default relevance scoring algorithm. It scores documents on three factors: term frequency (how often the search term appears in a document), inverse document frequency (how rare the term is across all documents), and field length normalization (shorter fields get a boost).

Out of the box, BM25 handles full-text search well. Things get interesting when you need to blend text relevance with other signals: recency, popularity, business rules. That's where function_score queries come in. They let you stack custom scoring logic on top of BM25. I've built functionscore queries combining BM25 with a Gaussian decay on publish date and a fieldvalue_factor on view count. The relevance improvement was immediate and measurable.

Aggregations

Aggregations are where OpenSearch stops being a search engine and becomes an analytics platform. Metrics, histograms, top terms, statistical analysis, all computed at query time across distributed shards.

Three categories worth knowing:

  • Metric aggregations compute values like avg, sum, min, max, cardinality (approximate distinct count using HyperLogLog).
  • Bucket aggregations group documents into buckets: by term, date range, histogram interval, geographic region, or custom filters.
  • Pipeline aggregations operate on the output of other aggregations, enabling calculations like moving averages, derivatives, and cumulative sums.

Aggregations run on doc values, a columnar data structure stored on disk. Fields you aggregate on need doc values enabled (they are by default for most types). The text field type doesn't support doc values. If you need to aggregate on a text field, add a keyword sub-field. I've debugged more than a few "aggregation not supported" errors that traced back to exactly this.

Query DSL Patterns

Here are the Query DSL patterns I reach for constantly in production.

Bool queries do the heavy lifting for complex search. The must, should, must_not, and filter clauses let you compose precise queries. Clauses in filter context don't contribute to relevance scoring and get cached, so always shove non-scoring criteria (date ranges, status filters, access control) into filter. I've seen teams put everything in must and wonder why their queries are slow. Moved three clauses from must to filter on a high-traffic endpoint once. Cut p99 latency by 35%.

Multi-match queries search across multiple fields with configurable strategies: best_fields (default; takes the highest-scoring field), most_fields (combines scores from all matching fields), and cross_fields (treats the field set as one big field; great for name searches across firstname and lastname).

Search templates don't get enough love. Any team with a search-heavy application should use them. They parameterize queries and separate query logic from application code, which means you can tune search relevance without redeploying. That separation alone is worth the small upfront investment.

Scaling Patterns

Horizontal vs. Vertical Scaling

OpenSearch scales in both directions: horizontally (more nodes) and vertically (bigger instance types).

Scaling AxisWhen to UseConsiderations
Vertical (larger instances)CPU or memory bottleneck on existing nodesSimpler, no shard redistribution needed
Horizontal (more nodes)Storage capacity, shard count, or throughput limitsRequires shard rebalancing, more network overhead
Add replicasRead throughput needs to increaseDoubles storage cost per replica added
Re-shard (reindex)Shards are too large or too numerousRequires reindexing, plan for downtime or use aliases

I always start vertical. It's operationally simpler and doesn't trigger shard redistribution. Upgrade the instance type, let the blue/green deployment run, done. Once I hit the ceiling of a single instance type (or when one fat instance costs more than several smaller ones), I go horizontal. That crossover point is usually around r6g.4xlarge or r7g.4xlarge.

Multi-AZ with Standby

Multi-AZ with Standby gives you a 99.99% SLA. Active nodes run across two Availability Zones; a full set of standby nodes sits in a third AZ. When an AZ fails, the standby nodes pick up within seconds. No waiting around for traditional shard reallocation.

Requirements:

  • Three Availability Zones
  • Even number of data nodes (minimum 2 per AZ, so 6 total)
  • Dedicated master nodes (3, one per AZ)
  • Two replicas for every primary shard

You'll pay roughly 50% more than a two-AZ deployment for that standby fleet. Worth it? Depends. Going from 99.9% to 99.99% availability and from minutes to seconds on recovery makes sense for any workload where search downtime directly hits revenue. I deployed this for an e-commerce search platform after a 12-minute outage during a single-AZ failure cost the business six figures. The 50% infrastructure bump was nothing by comparison.

Cross-Cluster Replication

Cross-cluster replication (CCR) creates a follower index on a remote cluster that stays in sync with a leader index automatically. A few places where it shines:

  • Geographic locality: Replicate data to a cluster in the user's region. Lower latency.
  • Workload isolation: Offload analytics queries to a follower cluster so they don't compete with production search.
  • Disaster recovery: Keep a standby cluster in a separate region.

CCR replicates at the shard operation level. Under normal conditions, the follower stays within seconds of the leader. Follower indices are read-only; all writes go to the leader.

Networking and Security

VPC Deployment

Every production workload goes in a VPC. Full stop. VPC deployment puts the OpenSearch endpoints inside your private subnets, reachable only through your VPC's network:

  • No public internet exposure
  • Network-level access control via security groups
  • VPC Flow Logs for network audit trails
  • AWS PrivateLink for cross-account access

The trade-off? VPC endpoints aren't publicly accessible, so OpenSearch Dashboards needs a VPN, bastion host, or AWS Verified Access. I put an Application Load Balancer with Cognito auth in front of the Dashboards endpoint. Gives every team member SSO access, nothing exposed to the public internet.

Fine-Grained Access Control (FGAC)

FGAC is powerful and widely misunderstood. It provides authorization at three levels:

LevelWhat It ControlsExample
Index-levelWhich indices a role can read/writeAllow analytics-team to read logs-* but not security-*
Document-levelWhich documents within an index a role can seeOnly show documents where department matches the user's department
Field-levelWhich fields within a document a role can seeHide ssn and salary fields from the general-reader role

FGAC can use an internal user database, or it can map to IAM principals and SAML identities. For AWS-native architectures, I map IAM roles to OpenSearch backend roles. Same IAM policies governing your other AWS services, now governing OpenSearch too. For enterprise deployments with existing identity providers, SAML federation through Cognito gives you single sign-on. I've set up both. IAM mapping is cleaner for greenfield projects. SAML federation is unavoidable when the enterprise already has an IdP that everyone expects to use.

Encryption

Three layers of encryption:

  • At rest: AES-256 via AWS KMS keys (service-managed or customer-managed). Covers all data files, indices, logs, swap files, and automated snapshots.
  • In transit: TLS 1.2 for all node-to-node and client-to-node communication. Enforced by default on new domains.
  • Field-level: Application-level encryption of sensitive fields before indexing, using your own key management. Protects data even from cluster administrators.

OpenSearch Serverless

OpenSearch Serverless is a completely different animal. No clusters, no nodes, no capacity planning. You create collections; AWS handles all infrastructure.

Collection Types

TypeOptimized ForUse Case
SearchFull-text search with low latencyApplication search, e-commerce, content discovery
Time seriesAppend-only log and event dataLog analytics, metrics, observability
Vector searchk-NN similarity searchSemantic search, RAG, recommendation engines

OpenSearch Compute Units (OCUs)

Pricing runs on OCUs: abstracted compute units bundling vCPU, memory, and storage I/O. Indexing and search get separate OCU types that scale independently based on demand.

ConfigurationMinimumScales To
Indexing OCUs2 OCUs (can be set to 0 during inactivity)Hundreds, based on throughput
Search OCUs2 OCUs (always on for active collections)Hundreds, based on query load
OCU cost~$0.24/OCU-hourPer-second billing

Collection Groups

Collection groups let you share OCU capacity across multiple collections. Without them, each collection carries its own minimum OCU overhead. With them, you can pack dozens of small collections into a shared capacity pool. I use collection groups heavily in dev environments where each microservice has its own collection but none generate enough traffic to justify dedicated OCUs.

When to Use Serverless vs. Provisioned

FactorUse ServerlessUse Provisioned
Traffic patternSpiky, unpredictableSteady, predictable
Operational overhead toleranceLow: want fully managedWilling to manage capacity
Fine-grained tuning neededNo: defaults are acceptableYes: need custom shard counts, instance types, plugins
Cost at scaleMore expensive above ~8 OCUs sustainedMore cost-effective at steady-state
Feature coverageSubset of OpenSearch featuresFull feature set

My default: start on serverless for every new project. I only move to provisioned when I hit a specific wall (cross-cluster replication, custom plugins, SAML, ISM policies, fine-grained shard control). If serverless costs outgrow the equivalent provisioned cost, migrate. Compatible data model, so the migration path is straightforward. I've done this transition twice. Both times took less than a day.

Ingestion Pipelines

Getting data into OpenSearch efficiently matters just as much as the cluster architecture. AWS provides several managed ingestion paths, and picking the right one depends on your workload.

Amazon OpenSearch Ingestion (OSI)

OSI is a fully managed, serverless ingestion pipeline built on Data Prepper (the open-source data collector from the OpenSearch project). It handles:

  • Log ingestion: Parse, transform, enrich, and route logs from S3, Kafka, HTTP endpoints.
  • Trace analytics: Ingest OpenTelemetry traces for distributed tracing.
  • Metric analytics: Collect and aggregate metrics data.

Pipelines are defined in YAML and scale automatically. Zero operational overhead compared to self-managed Data Prepper: no EC2 instances, no auto-scaling groups. I ran self-managed Data Prepper for about six months before switching. The instances needed patching, monitoring, scaling. OSI killed all of that overhead.

Amazon Kinesis Data Firehose

Firehose is the simplest way to stream data into OpenSearch. Buffering, batching, compression, retry logic: all handled. For log analytics workloads arriving from CloudWatch Logs, Kinesis Data Streams, or direct PUT via the Firehose API, it's often the right call.

You can transform records via Lambda functions before they reach OpenSearch, and Firehose supports backup to S3. Always enable S3 backup. Always. That backup gives you replay capability when you inevitably need to reindex after a mapping change. I've used it three times in the last two years. Saved days of work each time.

Fluent Bit

For container workloads on ECS or EKS, Fluent Bit is the standard log collector. AWS provides an optimized image with the OpenSearch output plugin baked in. Runs as a sidecar or DaemonSet, grabs stdout/stderr from application containers, ships logs to OpenSearch. Straightforward.

On high-volume EKS deployments, though, I route Fluent Bit output through Kinesis Data Firehose instead of sending it directly to OpenSearch. That buffer layer absorbs traffic spikes and keeps the cluster from drowning during log surges from deployment rollouts or error storms. I learned this one the hard way during a deployment that triggered a cascade of error logs. Without Firehose in front, the spike would've saturated the bulk indexing queue and caused data loss.

Lambda Patterns

Lambda works well for event-driven ingestion. S3 object creation triggers indexing, DynamoDB Streams replicate changes to OpenSearch for search, custom apps transform data before indexing.

The thing to get right with Lambda-to-OpenSearch: connection management. Use the OpenSearch client with connection pooling and reuse connections across invocations. Creating a new HTTPS connection per invocation adds real latency and exhausts the cluster's connection limits under high concurrency. I put connection reuse in the global scope of every Lambda function that talks to OpenSearch. The numbers speak for themselves: 200ms for a cold connection, 15ms for a reused one.

Vector Search and AI

OpenSearch isn't just a text search engine anymore. It's become a legitimate vector database powering modern AI applications. This is the fastest-evolving area of the platform, and capabilities have jumped substantially with every recent release.

OpenSearch supports k-nearest neighbor (k-NN) search with multiple algorithms and engines:

EngineAlgorithmBest For
nmslibHNSW (Hierarchical Navigable Small World)High recall, moderate scale
FaissHNSW, IVFLarge-scale vector search, GPU acceleration
LuceneHNSWCombining vector and text search in hybrid queries

Faiss with HNSW is my default recommendation for most workloads. Good recall-latency trade-offs, and it supports quantization (reducing vector precision to save memory) without significant recall loss.

Vector Dimensions and Memory

Vector search eats memory. Each vector consumes 4 bytes x dimensions for float32, or 1 byte x dimensions for byte quantization. For a typical embedding model producing 1536-dimensional vectors (like text-embedding-ada-002):

  • Float32: 6 KB per vector, 6 GB per million vectors (vectors only, excluding metadata)
  • Byte quantized: 1.5 KB per vector, 1.5 GB per million vectors

At scale, memory becomes the primary constraint. Full stop. Use scalar quantization for large collections where a small recall trade-off is acceptable. Make sure your data nodes have enough JVM heap and native memory for the HNSW graphs. I've sized clusters for 50 million vectors, and the memory math dominated every other capacity planning consideration.

Semantic search uses embedding models to find documents that are conceptually similar to a query, even when they share zero keywords. OpenSearch supports this through neural search, which hooks into ML models on SageMaker or Amazon Bedrock.

The pattern:

  1. At index time, an ingest pipeline calls the embedding model to convert text fields into vector representations. These get stored alongside the original text.
  2. At search time, the query text also gets converted to a vector, and k-NN search finds the nearest document vectors.
  3. Hybrid search blends BM25 text scoring with k-NN vector scoring using reciprocal rank fusion or linear combination. You get both keyword precision and semantic recall.

RAG with Amazon Bedrock

Retrieval-Augmented Generation (RAG) is how you ground large language models in domain-specific knowledge. OpenSearch serves as the retrieval layer; Bedrock handles generation.

Here's the flow:

  1. Documents get chunked, embedded, and indexed in OpenSearch with vector and text fields.
  2. A user query is embedded and used for hybrid search (k-NN + BM25) against OpenSearch.
  3. Top-K results are passed as context to a Bedrock foundation model (Claude, for example).
  4. The model generates a response grounded in the retrieved documents.

Bedrock Knowledge Bases automates this entire pattern: document chunking, embedding, indexing into OpenSearch Serverless, retrieval. For custom implementations where you need more control over chunking strategies, retrieval logic, or re-ranking, build the pipeline yourself with OpenSearch and Bedrock APIs. I've done both. Bedrock Knowledge Bases gets you to production in a day. The custom pipeline takes a week but lets you do things like chunk by semantic boundary instead of token count.

A newer pattern I'm seeing in production: agentic search. An AI agent dynamically constructs OpenSearch queries based on user intent. Instead of mapping user input to a fixed query template, the agent:

  1. Interprets the user's natural language question.
  2. Figures out which indices, fields, and query types are relevant.
  3. Constructs and executes one or more OpenSearch queries.
  4. Evaluates results and optionally refines the query.
  5. Synthesizes everything into a coherent answer.

This works well for data exploration where users don't know field names or query syntax. The agent hides Query DSL complexity behind a conversational interface. I've prototyped this with Claude as the agent and OpenSearch as the data store. Results are promising, especially when the agent has access to index mappings so it knows what fields exist and their types.

Operational Patterns

Blue/Green Deployments

AWS OpenSearch Service handles configuration changes (instance type changes, version upgrades, scaling) through blue/green deployments. New nodes (green) get provisioned, data replicates from existing nodes (blue), traffic redirects, blue nodes get terminated.

Mostly transparent, but plan for these:

  • Duration: Minutes to hours depending on data volume. A 10 TB cluster takes way longer than a 500 GB cluster.
  • During the deployment: The cluster temporarily runs double the nodes, meaning double the cost. Make sure your account has sufficient service limits.
  • Avoid chaining changes: Multiple configuration changes in quick succession queue up blue/green deployments and extend total time. Batch your changes. A colleague once made four separate changes in 30 minutes. The cascading blue/green deployments took 14 hours to finish.

CloudWatch Alarms to Set

At minimum, set alarms on these metrics for every production cluster:

MetricAlarm ThresholdWhy
ClusterStatus.red>= 1 for 1 minuteAt least one primary shard is unallocated: data loss risk
ClusterStatus.yellow>= 1 for 5 minutesAt least one replica is unallocated: reduced redundancy
FreeStorageSpace< 20% of totalWrite blocks trigger at ~5%, but performance degrades earlier
JVMMemoryPressure> 80% for 5 minutesHigh GC pressure leads to latency spikes and instability
CPUUtilization> 80% sustainedSearch latency will increase; scale vertically or horizontally
MasterCPUUtilization> 50% sustainedMaster instability affects the entire cluster
ThreadpoolWriteQueue> 100Write requests are queuing, indicating backpressure
ThreadpoolSearchQueue> 500Search requests are queuing, indicating overload

Common Failure Modes

Red cluster status: One or more primary shards are unallocated. A node failed; the cluster can't find somewhere to host the primary shard. If you've got replicas, OpenSearch promotes one to primary. If the cluster is also low on resources, that promotion fails. Scale up, check storage space, dig into root cause. I've been paged for red cluster status more times than I care to admit. The cause is almost always one of two things: disk full, or a node that ran out of JVM heap during a large merge.

Write blocks: OpenSearch enforces a write block when free storage drops below a threshold (configured by cluster.routing.allocation.disk.watermark.flood_stage, defaulting to 95% full). Once the write block kicks in, no indexing happens until you free storage by deleting indices, adding nodes, or bumping volume size. Set aggressive storage alarms. Use ISM policies to age off data before you hit this wall. Recovering from a write block at 3 AM is miserable.

JVM out of memory: The JVM heap is shared across indexing buffers, query caches, aggregation state, and field data. Oversized aggregations (high-cardinality terms aggregations, for example) can eat the entire heap. Set search.max_buckets to something reasonable, use collect_mode: breadth_first for nested aggregations, and watch JVMMemoryPressure closely. I set max_buckets to 10,000 on every cluster I build. The default of 65,535 is asking for trouble.

Split-brain: Rare with dedicated master nodes. It happens when network partitions prevent master nodes from communicating, and the cluster elects two leaders, creating divergent state. Three dedicated master nodes across three AZs prevents this.

Cost Optimization

Instance Family Selection

Instance FamilyOptimized ForUse Case
R6g/R7g (Graviton)MemoryGeneral-purpose data nodes, large working sets
M6g/M7g (Graviton)BalancedMaster nodes, coordinator nodes, moderate workloads
C6g/C7g (Graviton)ComputeQuery-heavy workloads with small working sets
I3Storage I/OHigh-throughput indexing with instance storage
OR1Writable warmDirect-to-warm indexing for cost-optimized log analytics

The Graviton Advantage

Graviton-based instances (the g suffix) deliver roughly 30% better price-performance versus equivalent x86 instances for OpenSearch workloads. Unless you've got a specific x86 dependency (extremely rare for managed OpenSearch), always go Graviton.

R7g is my default for data nodes. DDR5 memory, improved per-core performance over R6g, marginal cost increase. I've run benchmarks comparing R6g.2xlarge to R7g.2xlarge on identical workloads. R7g consistently delivered 15-20% better query throughput.

Reserved Instances

For steady-state production workloads, Reserved Instances (1-year or 3-year) cut costs by 30-50% versus On-Demand. The commitment applies to instance type and region (not a specific domain), so you keep flexibility to restructure clusters within the same instance family.

Figure out your baseline: the minimum instance count you'll maintain for the commitment period. Reserve that. Use On-Demand for burst capacity above it. I typically start with 1-year no-upfront for new workloads where I'm still learning the traffic pattern, then switch to 3-year partial-upfront once things stabilize.

Tiered Storage Economics

The economics are straightforward:

TierApproximate Cost per GB-MonthRelative Cost
Hot (gp3)$0.10-$0.15100%
UltraWarm$0.024~20%
Cold$0.006~5%
S3 (snapshot archive)$0.004~3%

For a workload retaining 90 days of log data at 100 GB/day:

  • All hot: ~9 TB x $0.12 = ~$1,080/month for storage alone
  • 7 days hot, 83 days warm: (700 GB x $0.12) + (8.3 TB x $0.024) = ~$284/month
  • 7 days hot, 23 days warm, 60 days cold: (700 GB x $0.12) + (2.3 TB x $0.024) + (6 TB x $0.006) = ~$175/month

84% reduction in storage costs. Same data retention policy. I've presented these numbers to finance teams and watched the reaction. Nobody argues against ISM policies after seeing this math.

Right-Sizing

Over-provisioned OpenSearch clusters are everywhere. Teams provision for peak load and never look back. A right-sizing review should check:

  • JVM heap utilization: Consistently below 50%? Instance type is oversized.
  • CPU utilization: Consistently below 30%? Smaller instance type or fewer nodes.
  • Storage utilization: Free storage above 50% after accounting for blue/green deployment overhead (which temporarily doubles storage needs)? Over-provisioned.
  • Shard count per node: Well below the recommended limit? More nodes than you need.

Do this quarterly for production clusters. AWS Cost Explorer and OpenSearch-specific CloudWatch metrics give you everything. I keep a spreadsheet per cluster tracking these metrics month over month. Trends tell you more than any single data point.

OpenSearch vs. Elasticsearch: The Fork Divergence

Several years post-fork, OpenSearch and Elasticsearch have diverged substantially. Here's where things stand if you're evaluating both:

Features OpenSearch Has That Elasticsearch Charges For

CapabilityOpenSearchElasticsearch
Security (RBAC, FGAC, SSO)Included, open sourceRequires paid subscription (Platinum/Enterprise)
AlertingIncludedRequires paid subscription
Anomaly detectionIncludedRequires paid subscription (Machine Learning)
Index management (ISM)IncludedIndex Lifecycle Management included, but advanced features require paid tier
SQL query supportIncluded (SQL plugin)Included (basic), advanced requires paid tier
Cross-cluster replicationIncludedRequires paid subscription (Platinum)

Features Unique to Each

OpenSearch ExclusiveElasticsearch Exclusive
Segment replicationESQL (new query language)
Searchable snapshots (managed service)Elastic Agent/Fleet
Neural search and ML CommonsElastic Security (SIEM) as integrated product
OpenSearch ServerlessElastic Cloud Serverless
Data Prepper / OSIElastic APM (more mature)
OR1 writable warm instancesFrozen tier with shared cache

Migration Considerations

If you're considering a migration in either direction, here are the constraints:

  • API compatibility: OpenSearch maintains compatibility with the Elasticsearch 7.x API surface. Most client libraries work with both, though you should use the dedicated OpenSearch client libraries for new development.
  • Index compatibility: Indices created in Elasticsearch 7.x work directly in OpenSearch. Elasticsearch 8.x indices don't.
  • Plugin ecosystem: Plugins aren't cross-compatible. Any custom Elasticsearch plugins need OpenSearch equivalents.

Conclusion

OpenSearch Service has grown into a versatile platform spanning full-text search, log analytics, vector similarity search, and AI-powered retrieval. The architectural decisions that matter most (node sizing, shard strategy, storage tiering, security configuration) are exactly the ones that are easiest to get wrong. You need to understand how the system behaves under load, not just what the API surface exposes.

The patterns I rely on in production:

  • Dedicated master nodes and Graviton instances from the start. Minimal cost overhead, immediate operational benefit.
  • ISM policies from day one. Retrofitting tiered storage onto a cluster that's been running everything in hot tier is painful. I've done it. Don't repeat my mistake.
  • Right-size shards aggressively. Over-sharding causes more production incidents than almost any other configuration mistake.
  • Multi-AZ with Standby for revenue-impacting workloads. The 50% cost increase pays for itself the first time you survive an AZ failure without customer impact.
  • Serverless for new projects. The operational simplicity is worth the trade-offs, and the data model is compatible with provisioned if you outgrow it.

OpenSearch keeps evolving fast. Vector search capabilities have improved substantially over the past year, and each release closes gaps that used to push teams toward dedicated vector databases. If you're building search and analytics infrastructure on AWS, this platform is worth understanding deeply.

Additional Resources

Let's Build Something!

I help teams ship cloud infrastructure that actually works at scale. Whether you're modernizing a legacy platform, designing a multi-region architecture from scratch, or figuring out how AI fits into your engineering workflow, I've seen your problem before. Let me help.

Currently taking on select consulting engagements through Vantalect.