Elasticsearch version 8.16.0

edit
A newer version is available. Check out the latest documentation.

Elasticsearch version 8.16.0

edit

Also see Breaking changes in 8.16.

Breaking changes

edit
Analysis
  • Set lenient to true by default when using updateable synonyms #110901
Data streams
  • Update data stream lifecycle telemetry to track global retention #112451
ES|QL
  • Entirely remove META FUNCTIONS #113967
Mapping
  • JDK locale database change #113975
Search
  • Adding breaking change entry for retrievers #115399

Bug fixes

edit
Aggregations
  • Always check the parent breaker with zero bytes in PreallocatedCircuitBreakerService #115181
  • Force using the last centroid during merging #111644 (issue: #111065)
Authentication
  • Check for disabling own user in Put User API #112262 (issue: #90205)
  • Expose cluster-state role mappings in APIs #114951
Authorization
  • Fix DLS & FLS sometimes being enforced when it is disabled #111915 (issue: #94709)
  • Fix DLS using runtime fields and synthetic source #112341
CRUD
  • Don’t fail retention lease sync actions due to capacity constraints #109414 (issue: #105926)
Cluster Coordination
  • Ensure clean thread context in MasterService #114512
Data streams
  • Adding support for data streams with a match-all template #111311 (issue: #111204)
  • Exclude internal data streams from global retention #112100
  • Fix verbose get data stream API not requiring extra privileges #112973
  • OTel mappings: avoid metrics to be rejected when attributes are malformed #114856
  • Resolve pipelines from template on lazy rollover write #116031 (issue: #112781)
  • [apm-data] Apply lazy rollover on index template creation #116219 (issue: #116230)
  • [otel-data] Add more kubernetes aliases #115429
  • logs-apm.error-*: define log.level field as keyword #112440
Distributed
  • Handle InternalSendException inline for non-forking handlers #114375
EQL
  • Fix validation of TEXT fields with case insensitive comparison #111238 (issue: #111235)
ES|QL
Geo
  • Fix cases of collections with one point #111193 (issue: #110982)
  • Try to simplify geometries that fail with TopologyException #115834
Health
  • Set replica_unassigned_buffer_time in constructor #112612
ILM+SLM
  • Make SnapshotLifecycleStats immutable so SnapshotLifecycleMetadata.EMPTY isn’t changed as side-effect #111215
Indices APIs
  • Align dot prefix validation with Serverless #116266
  • Revert "Add ResolvedExpression wrapper" #115317
Infra/Core
  • Fix max file size check to use getMaxFileSize #113723 (issue: #113705)
  • Guard blob store local directory creation with doPrivileged #115459
  • Handle BigInteger in xcontent copy #111937 (issue: #111812)
  • Report JVM stats for all memory pools (97046) #115117 (issue: #97046)
  • ByteArrayStreamInput: Return -1 when there are no more bytes to read #112214
Infra/Logging
  • Only emit product origin in deprecation log if present #111683 (issue: #81757)
Infra/Settings
  • GET _cluster/settings with include_defaults returns the expected fallback value if defined in elasticsearch.yml #110816 (issue: #110815)
Ingest Node
  • Fix IPinfo geolocation schema #115147
  • Fix getDatabaseType for unusual MMDBs #112888
License
  • Fix Start Trial API output acknowledgement header for features #111740 (issue: #111739)
  • Fix TokenService always appearing used in Feature Usage #112263 (issue: #61956)
  • Fix lingering license warning header in IP filter #115510 (issue: #114865)
Logs
  • Do not expand dots when storing objects in ignored source #113910
  • Fix ignore_above handling in synthetic source when index level setting is used #113570 (issue: #113538)
  • Fix synthetic source for flattened field when used with ignore_above #113499 (issue: #112044)
  • Prohibit changes to index mode, source, and sort settings during restore #115811
Machine Learning
  • Avoid ModelAssignment deadlock #109684
  • Avoid catch (Throwable t) in AmazonBedrockStreamingChatProcessor #115715
  • Allow for pytorch_inference results to include zero-dimensional tensors
  • Empty percentile results no longer throw no_such_element_exception in Anomaly Detection jobs #116015 (issue: #116013)
  • Fix NPE in Get Deployment Stats #115404
  • Fix bug in ML serverless autoscaling which prevented trained model updates from triggering a scale up #110734
  • Fix stream support for TaskType.ANY #115656
  • Fix parameter initialization for large forecasting models #2759
  • Forward bedrock connection errors to user #115868
  • Ignore unrecognized openai sse fields #114715
  • Prevent NPE if model assignment is removed while waiting to start #115430
  • Send mid-stream errors to users #114549
  • Temporarily return both modelId and inferenceId for GET /_inference until we migrate clients to only inferenceId #111490
  • Warn for model load failures if they have a status code <500 #113280
  • [Inference API] Remove unused Cohere rerank service settings fields in a BWC way #110427
  • [ML] Create Inference API will no longer return model_id and now only return inference_id #112508
Mapping
  • Fix MapperBuilderContext#isDataStream when used in dynamic mappers #110554
  • Fix synthetic source field names for multi-fields #112850
  • Retrieve the source for objects and arrays in a separate parsing phase #113027 (issue: #112374)
  • Two empty mappings now are created equally #107936 (issue: #107031)
Ranking
  • Fix MLTQuery handling of custom term frequencies #110846
  • Fix RRF validation for rank_constant < 1 #112058
  • Fix score count validation in reranker response #111212 (issue: #111202)
Search
  • Allow for querries on _tier to skip shards in the can_match phase #114990 (issue: #114910)
  • Allow out of range term queries for numeric types #112916
  • Do not exclude empty arrays or empty objects in source filtering #112250 (issue: #109668)
  • Fix synthetic source handling for bit type in dense_vector field #114407 (issue: #114402)
  • Improve DateTime error handling and add some bad date tests #112723 (issue: #112190)
  • Improve date expression/remote handling in index names #112405 (issue: #112243)
  • Make "too many clauses" throw IllegalArgumentException to avoid 500s #112678 (issue: #112177)
  • Make empty string searches be consistent with case (in)sensitivity #110833
  • Prevent flattening of ordered and unordered interval sources #114234
  • Remove needless forking to GENERIC in TransportMultiSearchAction #110796
  • Search/Mapping: KnnVectorQueryBuilder support for allowUnmappedFields #107047 (issue: #106846)
  • Span term query to convert to match no docs when unmapped field is targeted #113251
  • Speedup CanMatchPreFilterSearchPhase constructor #110860
  • Update BlobCacheBufferedIndexInput::readVLong to correctly handle negative long values #115594
  • [8.x] Limit the number of tasks that a single search can submit #115932
Security
  • Add ECK Role Mapping Cleanup #115823
  • Updated the transport CA name in Security Auto-Configuration. #106520 (issue: #106455)
Snapshot/Restore
TSDB
  • Implement parseBytesRef for TimeSeriesRoutingHashFieldType #113373 (issue: #112399)
Task Management
  • Improve handling of failure to create persistent task #114386
Transform
  • Allow task canceling of validate API calls #110951
  • Include reason when no nodes are found #112409 (issue: #112404)
Vector Search
  • Fix dim validation for bit element_type #114533
  • Support semantic_text in object fields #114601 (issue: #114401)
Watcher
  • Truncating watcher history if it is too large #111245 (issue: #94745)

Deprecations

edit
Analysis
  • Deprecate dutch_kp and lovins stemmer as they are removed in Lucene 10 #113143
  • deprecate edge_ngram side parameter #110829
CRUD
  • Deprecate dot-prefixed indices and composable template index patterns #112571
Search
  • Adding deprecation warnings for rrf using rank and sub_searches #114854
  • Deprecate legacy params from range query #113286

Enhancements

edit
Aggregations
  • Account for DelayedBucket before reduction #113013
  • Add protection for OOM during aggregations partial reduction #110520
  • Deduplicate BucketOrder when deserializing #112707
  • Lower the memory footprint when creating DelayedBucket #112519
  • Reduce heap usage for AggregatorsReducer #112874
  • Remove reduce and reduceContext from DelayedBucket #112547
Allocation
  • Add link to flood-stage watermark exception message #111315
  • Always allow rebalancing by default #111015
Application
  • [Profiling] add container.id field to event index template #111969
Authorization
  • Add manage roles privilege #110633
  • Add privileges required for CDR misconfiguration features to work on AWS SecurityHub integration #112574
Codec
  • Remove zstd feature flag for index codec best compression #112665
  • [8.x] Remove zstd feature flag for index codec best compression #112857
Data streams
  • Add verbose flag retrieving maximum_timestamp for get data stream API #112303
  • Display effective retention in the relevant data stream APIs #112019
  • Expose global retention settings via data stream lifecycle API #112210
  • Ignore warning on yaml test put template #116201 (issue: #116158)
  • Make ecs@mappings work with OTel attributes #111600
Distributed
  • Add link to Max Shards Per Node exception message #110993
ES|QL
  • Add EXP ES|QL function #110879
  • Delay construction of warnings #114368
  • Add CircuitBreaker to TDigest, Step 3: Connect with ESQL CB #113387
  • Add CircuitBreaker to TDigest, Step 4: Take into account shallow classes size #113613 (issue: #113916)
  • Collect and display execution metadata for ES|QL cross cluster searches #112595 (issue: #112402)
  • Add support for multivalue fields in Arrow output #114774
  • BUCKET: allow numerical spans as whole numbers #111874 (issues: #104646, #109340, #105375)
  • Have BUCKET generate friendlier intervals #111879 (issue: #110916)
  • Profile more timing information #111855
  • Push down filters even in case of renames in Evals #114411
  • Speed up CASE for some parameters #112295
  • Speed up grouping by bytes #114021
  • Use less memory in listener #114358
  • Add support for cached strings in plan serialization #112929
  • Add Telemetry API and track top functions #111226
  • Enhance SORT push-down to Lucene to cover references to fields and ST_DISTANCE function #112938 (issue: #109973)
  • Siem ea 9521 improve test #111552
  • Support multi-valued fields in compute engine for ST_DISTANCE #114836 (issue: #112910)
  • Add SPACE function #112350
  • Add finish() elapsed time to aggregation profiling times #113172 (issue: #112950)
  • Make query wrapped by SingleValueQuery cacheable #110116
  • Add hypot function #114382
  • Cast mixed numeric types to a common numeric type for Coalesce and In at Analyzer #111917 (issue: #111486)
  • Combine Disjunctive CIDRMatch #111501 (issue: #105143)
  • Create Range in PushFiltersToSource for qualified pushable filters on the same field #111437
  • Name parameter with leading underscore #111950 (issue: #111821)
  • Named parameter for field names and field name patterns #112905
  • Validate index name in parser #112081
  • Add reverse function #113297
  • Explicit cast a string literal to date_period and time_duration in arithmetic operations #109193
Experiences
  • Integrate IBM watsonx to Inference API for text embeddings #111770
Geo
  • Add support for spatial relationships in point field mapper #112126
  • Small performance improvement in h3 library #113385
  • Support docvalues only query in shape field #112199
Health
  • (API) Cluster Health report unassigned_primary_shards #112024
  • Do not treat replica as unassigned if primary recently created and unassigned time is below a threshold #112066
ILM+SLM
  • ILM: Add total_shards_per_node setting to searchable snapshot #112972 (issue: #112261)
  • PUT slm policy should only increase version if actually changed #111079
  • Preserve Step Info Across ILM Auto Retries #113187
  • Register SLM run before snapshotting to save stats #110216
  • SLM interval schedule followup - add back getFieldName style getters #112123
Infra/Core
  • Add nanos support to ZonedDateTime serialization #111689 (issue: #68292)
  • Extend logging for dropped warning headers #111624 (issue: #90527)
  • Give the kibana system user permission to read security entities #114363
Infra/Metrics
  • Add TaskManager to pluginServices #112687
Infra/REST API
  • Optimize the loop processing of URL decoding #110237 (issue: #110235)
Infra/Scripting
  • Expose HexFormat in Painless #112412
Infra/Settings
  • Improve exception message for bad environment variable placeholders in settings #114552 (issue: #110858)
  • Reprocess operator file settings when settings service starts, due to node restart or master node change #114295
Ingest Node
  • Add size_in_bytes to enrich cache stats #110578
  • Add support for templates when validating mappings in the simulate ingest API #111161
  • Adding index_template_substitutions to the simulate ingest API #114128
  • Adding component template substitutions to the simulate ingest API #113276
  • Adding mapping validation to the simulate ingest API #110606
  • Adds example plugin for custom ingest processor #112282 (issue: #111539)
  • Fix unnecessary mustache template evaluation #110986 (issue: #110191)
  • Listing all available databases in the _ingest/geoip/database API #113498
  • Make enrich cache based on memory usage #111412 (issue: #106081)
  • Tag redacted document in ingest metadata #113552
  • Verify Maxmind database types in the geoip processor #114527
Logs
  • Add validation for synthetic source mode in logs mode indices #110677
  • Store original source for keywords using a normalizer #112151
Machine Learning
  • Add Completion Inference API for Alibaba Cloud AI Search Model #112512
  • Add Streaming Inference spec #113812
  • Add chunking settings configuration to CohereService, AmazonBedrockService, and AzureOpenAiService #113897
  • Add chunking settings configuration to ElasticsearchService/ELSER #114429
  • Add custom rule parameters to force time shift #110974
  • Adding chunking settings to GoogleVertexAiService, AzureAiStudioService, and AlibabaCloudSearchService #113981
  • Adding chunking settings to MistralService, GoogleAiStudioService, and HuggingFaceService #113623
  • Adds a new Inference API for streaming responses back to the user. #113158
  • Allow users to force a detector to shift time series state by a specific amount #2695
  • Create StreamingHttpResultPublisher #112026
  • Create an ml node inference endpoint referencing an existing model #114750
  • Default inference endpoint for ELSER #113873
  • Default inference endpoint for the multilingual-e5-small model #114683
  • Dynamically get of num allocations #114636
  • Enable OpenAI Streaming #113911
  • Filter empty task settings objects from the API response #114389
  • Migrate Inference to ChunkedToXContent #111655
  • Register Task while Streaming #112369
  • Server-Sent Events for Inference response #112565
  • Stream Anthropic Completion #114321
  • Stream Azure Completion #114464
  • Stream Bedrock Completion #114732
  • Stream Cohere Completion #114080
  • Stream Google Completion #114596
  • Stream OpenAI Completion #112677
  • Support sparse embedding models in the elasticsearch inference service #112270
  • Switch default chunking strategy to sentence #114453
  • Update the Pytorch library to version 2.3.1 #2688
  • Upgrade to AWS SDK v2 #114309 (issue: #110590)
  • Use the same chunking configurations for models in the Elasticsearch service #111336
  • Validate streaming HTTP Response #112481
  • Wait for allocation on scale up #114719
  • [Inference API] Add Alibaba Cloud AI Search Model support to Inference API #111181
  • [Inference API] Add Docs for AlibabaCloud AI Search Support for the Inference API #111181
  • [Inference API] Introduce Update API to change some aspects of existing inference endpoints #114457
  • [Inference API] Prevent inference endpoints from being deleted if they are referenced by semantic text #110399
  • [Inference API] alibabacloud ai search service support chunk infer to support semantic_text field #110399
Mapping
  • Add Field caps support for Semantic Text #111809
  • Add Lucene segment-level fields stats #111123
  • Add Search Inference ID To Semantic Text Mapping #113051
  • Add object param for keeping synthetic source #113690
  • Add support for multi-value dimensions #112645 (issue: #110387)
  • Allow dimension fields to have multiple values in standard and logsdb index mode #112345 (issues: #112232, #112239)
  • Allow fields with dots in sparse vector field mapper #111981 (issue: #109118)
  • Allow querying index_mode #110676
  • Configure keeping source in FieldMapper #112706
  • Control storing array source with index setting #112397
  • Introduce mode subobjects=auto for objects #110524
  • Update semantic_text field to support indexing numeric and boolean data types #111284
  • Use fallback synthetic source for copy_to and doc_values: false cases #112294 (issues: #110753, #110038, #109546)
Network
  • Add links to network disconnect troubleshooting #112330
Ranking
  • Add timeout and cancellation check to rescore phase #115048
Relevance
  • Add a query rules tester API call #114168
Search
  • Add more dense_vector details for cluster stats field stats #113607
  • Add range and regexp Intervals #111465
  • Adding support for allow_partial_search_results in PIT #111516
  • Allow incubating Panama Vector in simdvec, and add vectorized ipByteBin #112933
  • Avoid using concurrent collector manager in LuceneChangesSnapshot #113816
  • Bool query early termination should also consider must_not clauses #115031
  • Deduplicate Kuromoji User Dictionary #112768
  • Multi term intervals: increase max_expansions #112826 (issue: #110491)
  • Search coordinator uses event.ingested in cluster state to do rewrites #111523
  • Update cluster stats for retrievers #114109
Security
  • (logger) change from error to warn for short circuiting user #112895
  • Add asset criticality indices for kibana_system_user #113588
  • Add tier preference to security index settings allowlist #111818
  • [Service Account] Add AutoOps account #111316
Snapshot/Restore
  • Add max_multipart_parts setting to S3 repository #113989
  • Add support for Azure Managed Identity #111344
  • Add telemetry for repository usage #112133
  • Add workaround for missing shard gen blob #112337
  • Clean up dangling S3 multipart uploads #111955 (issues: #101169, #44971)
  • Execute shard snapshot tasks in shard-id order #111576 (issue: #108739)
  • Include account name in Azure settings exceptions #111274
  • Introduce repository integrity verification API #112348 (issue: #52622)
Stats
  • Track search and fetch failure stats #113988
TSDB
  • Add support for boolean dimensions #111457 (issue: #111338)
  • Stop iterating over all fields to extract @timestamp value #110603 (issue: #92297)
  • Support booleans in routing path #111445
Vector Search
  • Dense vector field types updatable for int4 #110928
  • Use native scalar scorer for int8_flat index #111071

New features

edit
Data streams
  • Introduce global retention in data stream lifecycle. #111972
  • X-pack/plugin/otel: introduce x-pack-otel plugin #111091
ES|QL
  • Add match function #113374
  • Add MV_PSERIES_WEIGHTED_SUM for score calculations used by security solution #109017
  • Add async ID and is_running headers to ESQL async query #111840
  • Add boolean support to Max and Min aggs #110527
  • Add boolean support to TOP aggregation #110718
  • Added mv_percentile function #111749 (issue: #111591)
  • Introduce per agg filter #113735
  • Strings support for MAX and MIN aggregations #111544
  • Support IP fields in MAX and MIN aggregations #110921
  • TOP aggregation IP support #111105
  • TOP support for strings #113183 (issue: #109849)
  • mv_median_absolute_deviation function #112055 (issue: #111590)
  • Add MATCH operator #110971
ILM+SLM
  • SLM Interval based scheduling #110847
Inference
Ingest Node
Machine Learning
  • Inference autoscaling #109667
  • Telemetry for inference adaptive allocations #110630
Relevance
  • [Query rules] Add exclude query rule type #111420
Search
  • Async search: Add ID and "is running" http headers #112431 (issue: #109576)
  • Cross-cluster search telemetry #113825
Vector Search
  • Adding new bbq index types behind a feature flag #114439

Upgrades

edit
Infra/Core
  • Upgrade xcontent to Jackson 2.17.0 #111948
  • Upgrade xcontent to Jackson 2.17.2 #112320
Infra/Metrics
Search
Snapshot/Restore
  • Upgrade Azure SDK #111225
  • Upgrade repository-azure dependencies #112277

Known issues

edit
ES|QL
  • Some valid queries using an ENRICH command can fail when a match field is used that is absent from some indices or shards, either with a 500 status code due to NullPointerException or ClassCastException or with a 400 status code and IllegalArgumentException. This is fixed in #126187.
  • A bug in the ES|QL STATS command may yield incorrect results. The bug only happens in very specific cases that follow this pattern: STATS ... BY keyword1, keyword2, i.e. the command must have exactly two grouping fields, both keywords, where the first field has high cardinality (more than 65k distinct values).

    The bug is described in detail in [this issue](https://github.com/elastic/elasticsearch/issues/130644).
    The problem was introduced in 8.16.0 and [fixed](https://github.com/elastic/elasticsearch/pull/130705) in 8.17.9, 8.18.7.
    Possible workarounds include:
    * switching the order of the grouping keys (eg. `STATS ... BY keyword2, keyword1`, if the `keyword2` has a lower cardinality)
    * reducing the grouping key cardinality, by filtering out values before STATS