Hibernate Search 6 Index Aliases

aaronloes · September 18, 2019, 8:00pm

Is it possible to use Elasticsearch index aliases for document write and read operations? I’ve been using Elasticseach for a few years and managing indexes is a challenge when wanting to keep shard sizes down. Elasticsearch IML greatly helps with this and I want to use it. However, this requires the use of different index aliases for write and read operations. I noticed that when I inform Hibernate search of my index name being an alias, it is not able to read the documents for parsing. Also looking through the code, I dont really seem to be able to supply my own means of informing Hibernate Search what POJO the document name belongs to. If this is possible, I’d like to know how. If not, I’d like to know how to submit a feature request.

Thank you

yrodiere · September 19, 2019, 6:10am

This is not currently supported. I know we will need to work with aliases in order to implement zero-downtime reindexing, though, so I’d be really interested in how you see this working.

Please create a ticket on JIRA and detail what you need, ideally with an example of how you imagine yourself configuring it.

It could also be interesting to know what didn’t work exactly when you attempted to use aliases. If you have a stack trace, please include it in the ticket.

Thanks!

aaronloes · September 20, 2019, 6:41pm

@yrodiere thank you for the response.

I only plan to use Hibernate Search to write documents, execute searches, and read documents to/from Elasticsearch. I do not plan to use it to create my indexes (setting, mappings, aliases, etc) or any other type of management. My thought process her is that I don’t use Hibernate ORM to manage my relational database schemas in a production environment so why should Hibernate Search do the same for my search engine? For databases, we have tools like Flyway and Liquibase for handling database schema migrations and I have something similar for Elasticsearch. The other thought on this is that there is so much Elasticsearch provides in terms of functionality and options for organizing your data based on your use case, I want to work as natively with it for the most flexibility.

The exception I received was

Unknown index name encountered in Elasticsearch response: ‘myapplication-myobject-v1’

and this was my setup:

I have Hibernate Search index management disabled by default hibernate.search.backends.elasticsearch.index_defaults.lifecycle.strategy=none
I created an Index Template in Elasticsearch for indexes that match the pattern myapplication-myobject-*
I created an Index in Elasticsearch for my search documents myapplication-myobject-v1
I created an Index Alias in Elasticsearch that points myapplication-myobject to myapplication-myobject-v1

I annotated my entity MyObject with @Indexed and gave it a value that matched the alias myapplication-myobject

@Entity
// ... other hibernate annotations ...
@Indexed(index = "myapplication-myobject")
public class MyObject {
    // ... properties ...
}

I saved an instance of MyObject and Hibernate successfully wrote the record to the database and a document to my Elasticserch index, via my alias
I executed an Elasticsearch search via Hibernate Search against the alias myapplication-myobject and it pulled back results but then threw the below exception.

Digging through the code, I found that it only maps the value in the @Indexed annotation with the object type being annotated. It then looks that the field _index in the search result hits in order to match the the target object type. However, in this setup, the value for _index is the actual index name myapplication-myobject-v1 and not the alias that is in the annotation with points to the index.

Here is the full stack trace:

2019-09-19 15:16:19.012 ERROR 1304 --- [port thread - 2] o.h.s.engine.common.spi.LogErrorHandler  : HSEARCH000058: Exception occurred org.hibernate.search.util.common.SearchException: HSEARCH400007: Elasticsearch request failed: HSEARCH400531: Unknown index name encountered in Elasticsearch response: 'myapplication-myobject-v1'
Context: backend 'elasticsearch'
Request: POST /myapplication-myobject/_search with parameters {size=10000, track_total_hits=true}
Response: 200 'OK' with body
{
  "took": 18,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "myapplication-myobject-v1",
        "_type": "_doc",
        "_id": "9d8d70f6-60e9-4701-991e-7f3607d15f77",
        "_score": 0.2876821,
        "_source": {
          "createdDate": "2019-09-19T20:15:40.893956000Z",
          "lastModifiedDate": "2019-09-19T20:15:40.893956000Z",
          "foo": "bar"
        }
      }
    ]
  }
}

Primary Failure:
SearchWork[path = /myapplication-myobject/_search, refreshedIndexName = null, refreshStrategy = NONE]Subsequent failures:
SearchWork[path = /myapplication-myobject/_search, refreshedIndexName = null, refreshStrategy = NONE]

org.hibernate.search.util.common.SearchException: HSEARCH400007: Elasticsearch request failed: HSEARCH400531: Unknown index name encountered in Elasticsearch response: 'myapplication-myobject-v1'
Context: backend 'elasticsearch'
Request: POST /myapplication-myobject/_search with parameters {size=10000, track_total_hits=true}
Response: 200 'OK' with body
{
  "took": 18,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "myapplication-myobject-v1",
        "_type": "_doc",
        "_id": "9d8d70f6-60e9-4701-991e-7f3607d15f77",
        "_score": 0.2876821,
        "_source": {
          "createdDate": "2019-09-19T20:15:40.893956000Z",
          "lastModifiedDate": "2019-09-19T20:15:40.893956000Z",
          "foo": "bar"
        }
      }
    ]
  }
}

	at org.hibernate.search.backend.elasticsearch.work.impl.AbstractSimpleElasticsearchWork.handleResult(AbstractSimpleElasticsearchWork.java:108) ~[hibernate-search-backend-elasticsearch-6.0.0.Alpha9.jar:6.0.0.Alpha9]
	at org.hibernate.search.backend.elasticsearch.work.impl.AbstractSimpleElasticsearchWork.lambda$execute$3(AbstractSimpleElasticsearchWork.java:71) ~[hibernate-search-backend-elasticsearch-6.0.0.Alpha9.jar:6.0.0.Alpha9]
	at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072) ~[na:na]
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[na:na]
	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073) ~[na:na]
	at org.hibernate.search.backend.elasticsearch.client.impl.ElasticsearchClientImpl$1.onSuccess(ElasticsearchClientImpl.java:115) ~[hibernate-search-backend-elasticsearch-6.0.0.Alpha9.jar:6.0.0.Alpha9]
	at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onSuccess(RestClient.java:836) ~[elasticsearch-rest-client-6.4.3.jar:6.4.3]
	at org.elasticsearch.client.RestClient$1.completed(RestClient.java:538) ~[elasticsearch-rest-client-6.4.3.jar:6.4.3]
	at org.elasticsearch.client.RestClient$1.completed(RestClient.java:529) ~[elasticsearch-rest-client-6.4.3.jar:6.4.3]
	at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:122) ~[httpcore-4.4.12.jar:4.4.12]
	at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:181) ~[httpasyncclient-4.1.4.jar:4.1.4]
	at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:448) ~[httpcore-nio-4.4.12.jar:4.4.12]
	at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:338) ~[httpcore-nio-4.4.12.jar:4.4.12]
	at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265) ~[httpcore-nio-4.4.12.jar:4.4.12]
	at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81) ~[httpasyncclient-4.1.4.jar:4.1.4]
	at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39) ~[httpasyncclient-4.1.4.jar:4.1.4]
	at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114) ~[httpcore-nio-4.4.12.jar:4.4.12]
	at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) ~[httpcore-nio-4.4.12.jar:4.4.12]
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) ~[httpcore-nio-4.4.12.jar:4.4.12]
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) ~[httpcore-nio-4.4.12.jar:4.4.12]
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) ~[httpcore-nio-4.4.12.jar:4.4.12]
	at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) ~[httpcore-nio-4.4.12.jar:4.4.12]
	at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591) ~[httpcore-nio-4.4.12.jar:4.4.12]
	at java.base/java.lang.Thread.run(Thread.java:834) ~[na:na]
Caused by: org.hibernate.search.util.common.SearchException: HSEARCH400531: Unknown index name encountered in Elasticsearch response: 'myapplication-myobject-v1'
Context: backend 'elasticsearch'
	at org.hibernate.search.backend.elasticsearch.impl.ElasticsearchBackendImpl.lambda$new$0(ElasticsearchBackendImpl.java:93) ~[hibernate-search-backend-elasticsearch-6.0.0.Alpha9.jar:6.0.0.Alpha9]
	at java.base/java.util.Optional.map(Optional.java:265) ~[na:na]
	at org.hibernate.search.backend.elasticsearch.search.projection.impl.DocumentReferenceExtractorHelper.extractDocumentReference(DocumentReferenceExtractorHelper.java:41) ~[hibernate-search-backend-elasticsearch-6.0.0.Alpha9.jar:6.0.0.Alpha9]
	at org.hibernate.search.backend.elasticsearch.search.projection.impl.ElasticsearchEntityProjection.extract(ElasticsearchEntityProjection.java:34) ~[hibernate-search-backend-elasticsearch-6.0.0.Alpha9.jar:6.0.0.Alpha9]
	at org.hibernate.search.backend.elasticsearch.search.query.impl.Elasticsearch7SearchResultExtractor.extractHits(Elasticsearch7SearchResultExtractor.java:73) ~[hibernate-search-backend-elasticsearch-6.0.0.Alpha9.jar:6.0.0.Alpha9]
	at org.hibernate.search.backend.elasticsearch.search.query.impl.Elasticsearch7SearchResultExtractor.extract(Elasticsearch7SearchResultExtractor.java:56) ~[hibernate-search-backend-elasticsearch-6.0.0.Alpha9.jar:6.0.0.Alpha9]
	at org.hibernate.search.backend.elasticsearch.work.impl.SearchWork.generateResult(SearchWork.java:51) ~[hibernate-search-backend-elasticsearch-6.0.0.Alpha9.jar:6.0.0.Alpha9]
	at org.hibernate.search.backend.elasticsearch.work.impl.SearchWork.generateResult(SearchWork.java:27) ~[hibernate-search-backend-elasticsearch-6.0.0.Alpha9.jar:6.0.0.Alpha9]
	at org.hibernate.search.backend.elasticsearch.work.impl.AbstractSimpleElasticsearchWork.handleResult(AbstractSimpleElasticsearchWork.java:97) ~[hibernate-search-backend-elasticsearch-6.0.0.Alpha9.jar:6.0.0.Alpha9]
	... 23 common frames omitted

2019-09-19 15:16:19.016 ERROR 1304 --- [nio-9002-exec-6] o.a.c.c.C.[.[.[/].[dispatcherServlet]    : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is org.hibernate.search.util.common.SearchException: HSEARCH400007: Elasticsearch request failed: HSEARCH400531: Unknown index name encountered in Elasticsearch response: 'myapplication-myobject-v1'
Context: backend 'elasticsearch'
Request: POST /myapplication-myobject/_search with parameters {size=10000, track_total_hits=true}
Response: 200 'OK' with body
{
  "took": 18,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "myapplication-myobject-v1",
        "_type": "_doc",
        "_id": "9d8d70f6-60e9-4701-991e-7f3607d15f77",
        "_score": 0.2876821,
        "_source": {
          "createdDate": "2019-09-19T20:15:40.893956000Z",
          "lastModifiedDate": "2019-09-19T20:15:40.893956000Z",
          "foo": "bar"
        }
      }
    ]
  }
}
] with root cause

org.hibernate.search.util.common.SearchException: HSEARCH400531: Unknown index name encountered in Elasticsearch response: 'myapplication-myobject-v1'
Context: backend 'elasticsearch'
	at org.hibernate.search.backend.elasticsearch.impl.ElasticsearchBackendImpl.lambda$new$0(ElasticsearchBackendImpl.java:93) ~[hibernate-search-backend-elasticsearch-6.0.0.Alpha9.jar:6.0.0.Alpha9]
	at java.base/java.util.Optional.map(Optional.java:265) ~[na:na]
	at org.hibernate.search.backend.elasticsearch.search.projection.impl.DocumentReferenceExtractorHelper.extractDocumentReference(DocumentReferenceExtractorHelper.java:41) ~[hibernate-search-backend-elasticsearch-6.0.0.Alpha9.jar:6.0.0.Alpha9]
	at org.hibernate.search.backend.elasticsearch.search.projection.impl.ElasticsearchEntityProjection.extract(ElasticsearchEntityProjection.java:34) ~[hibernate-search-backend-elasticsearch-6.0.0.Alpha9.jar:6.0.0.Alpha9]
	at org.hibernate.search.backend.elasticsearch.search.query.impl.Elasticsearch7SearchResultExtractor.extractHits(Elasticsearch7SearchResultExtractor.java:73) ~[hibernate-search-backend-elasticsearch-6.0.0.Alpha9.jar:6.0.0.Alpha9]
	at org.hibernate.search.backend.elasticsearch.search.query.impl.Elasticsearch7SearchResultExtractor.extract(Elasticsearch7SearchResultExtractor.java:56) ~[hibernate-search-backend-elasticsearch-6.0.0.Alpha9.jar:6.0.0.Alpha9]
	at org.hibernate.search.backend.elasticsearch.work.impl.SearchWork.generateResult(SearchWork.java:51) ~[hibernate-search-backend-elasticsearch-6.0.0.Alpha9.jar:6.0.0.Alpha9]
	at org.hibernate.search.backend.elasticsearch.work.impl.SearchWork.generateResult(SearchWork.java:27) ~[hibernate-search-backend-elasticsearch-6.0.0.Alpha9.jar:6.0.0.Alpha9]
	at org.hibernate.search.backend.elasticsearch.work.impl.AbstractSimpleElasticsearchWork.handleResult(AbstractSimpleElasticsearchWork.java:97) ~[hibernate-search-backend-elasticsearch-6.0.0.Alpha9.jar:6.0.0.Alpha9]
	at org.hibernate.search.backend.elasticsearch.work.impl.AbstractSimpleElasticsearchWork.lambda$execute$3(AbstractSimpleElasticsearchWork.java:71) ~[hibernate-search-backend-elasticsearch-6.0.0.Alpha9.jar:6.0.0.Alpha9]
	at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072) ~[na:na]
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[na:na]
	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073) ~[na:na]
	at org.hibernate.search.backend.elasticsearch.client.impl.ElasticsearchClientImpl$1.onSuccess(ElasticsearchClientImpl.java:115) ~[hibernate-search-backend-elasticsearch-6.0.0.Alpha9.jar:6.0.0.Alpha9]
	at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onSuccess(RestClient.java:836) ~[elasticsearch-rest-client-6.4.3.jar:6.4.3]
	at org.elasticsearch.client.RestClient$1.completed(RestClient.java:538) ~[elasticsearch-rest-client-6.4.3.jar:6.4.3]
	at org.elasticsearch.client.RestClient$1.completed(RestClient.java:529) ~[elasticsearch-rest-client-6.4.3.jar:6.4.3]
	at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:122) ~[httpcore-4.4.12.jar:4.4.12]
	at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:181) ~[httpasyncclient-4.1.4.jar:4.1.4]
	at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:448) ~[httpcore-nio-4.4.12.jar:4.4.12]
	at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:338) ~[httpcore-nio-4.4.12.jar:4.4.12]
	at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265) ~[httpcore-nio-4.4.12.jar:4.4.12]
	at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81) ~[httpasyncclient-4.1.4.jar:4.1.4]
	at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39) ~[httpasyncclient-4.1.4.jar:4.1.4]
	at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114) ~[httpcore-nio-4.4.12.jar:4.4.12]
	at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) ~[httpcore-nio-4.4.12.jar:4.4.12]
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) ~[httpcore-nio-4.4.12.jar:4.4.12]
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) ~[httpcore-nio-4.4.12.jar:4.4.12]
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) ~[httpcore-nio-4.4.12.jar:4.4.12]
	at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) ~[httpcore-nio-4.4.12.jar:4.4.12]
	at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591) ~[httpcore-nio-4.4.12.jar:4.4.12]
	at java.base/java.lang.Thread.run(Thread.java:834) ~[na:na]

In terms of how I would want to see it work, that’s a bit of a tough question because there are so many factors. For this use case, and for Elasticsearch ILM use case, there at least needs to be a means of informing Hibernate Search the following for any given entity:

a write alias to use for document inserts
an alias to use for document pulls, updates and searches
a pattern for matching the index the document exists in, or, alternatively, Hibernate Search should not use the index name for matching at all and should use some other field.

I hope this makes it more clear. I’ll wait for a response before creating the JIRA ticket as you may have feedback that would be usefull in creating such a ticket.

aaronloes · September 23, 2019, 4:40pm

sorry for the double reply. i didnt see that the responses get screened first

yrodiere · September 24, 2019, 6:25am

Thanks for the feedback, this makes a lot of sense.

Some thoughts:

I’d recommend using the validate lifecycle strategy instead of none, to check at startup that the Elasticsearch mapping matches your Hibernate Search mapping. Any particular reason you picked none instead of validate?
You mentioned:
aaronloes:
1. a write alias to use for document inserts
2. an alias to use for document pulls, updates and searches
That’s a mistake, right? Updates should use the same alias as inserts?
You seem to consider all this alias configuration as static. What would be your strategy to fully reindex your database? Would it be acceptable that, to users, the index will appear not to change at all during mass indexing (changes won’t appear until mass indexing is done)?
Your answer is already very complete and would make a nice JIRA ticket

I do not know how we will be able to map the true index name to the mapped type yet; we’ll have to look for solutions. Maybe there are options for Elasticsearch to return the aliases along with the index name… that would be nice. Otherwise, yes, you will have to provide a pattern.

aaronloes · September 24, 2019, 7:58pm

I have none set for now but may switch to validate in the future. It all depends on if the mappings I create which works at run-time pass the validation expected by Hibernate Search. Are the indexes validated on startup or on first document insert/read?
For ILM rollover, you are correct. The same alias is used for writes and reads (Getting started with index lifecycle management | Elasticsearch Guide [7.1] | Elastic). This would be good enough for my needs as far as I can see. However, just pointing out that there are many different strategies to use for persisting data with Elasticsearch that could make use of separate read and write indexes (think time series data where the data is written to foobar-2019.09.24 but read from foobar-*. Many strategies are probably not suitable for Hibernate Search use cases but it may be a future possibility.
I have not thought about mass re-indexing much as I am in new development of a project that has yet to see production. I am not also aware of all the goals mass re-indexing is trying to achieve. The only one that comes to mind is that Hibernate Search just wants to make sure that all documents for searchable entities are present in the index and up-to-date as well as having no documents in the index that do not match a searchable entity. For those that are allowing Hibernate Search to manage the indexes, then yes, i would think it would be preferable to have as much of the data searchable during the mass re-indexing. However, I think for many use cases, having Hibernate Search creating, renaming, deleting, etc. indexes could be problematic. Elasticsearch has much available for index management already available (index templates, ILM, curator, etc) that, if setup, could conflict with whatever Hibernate Search does.

For my current preferences/thoughts, just like in the case of Hibernate ORM, I’d rather have my cluster setup the way it needs to be for my needs. Users created, index templates created, life cycle policies create, indexes created, etc, prior to Hibernate Search startup (a.k.a Hibernate Search does not manage my indexes, settings, analyzers, mappings, etc.) and only have Hibernate Search primarily be responsible for writes and reads/searches of documents for searchable entities. With that, in the case of mapping changes (a.k.a. data migrations), it would be up to the operator perform a rollover where they apply the appropriate changes to the data as needed for Hibernate Search to use the data. How this is done with zero downtime, I am not sure as it would depend on how drastic of a mapping change has occurred. In the case of a non-breaking change, this likely could be done with aliases.

In the case of just trying to make sure the data is in-sync between the search documents and the persisted searchable entities, would it not be possible to do this in the same index that the documents already exist? Meaning, iterate over all the persisted entities in the database and ensure their search documents are up-to-date and present in the index (re-persist them). And for any documents that do not match a persisted searchable entity, either just ignore it or delete it when one is found during a read/search operation. I would think this would keep the persisted searchable entities and their search documents pretty well in sync w/o having to do a index create/delete or an index rollover. Also, as a side, i’d be curious as to how this will be done with a distributed/scaled-out application. Executing a mass indexing would need to be coordinated between all instances as to not conflict with each other. I suppose as long as it is not automatic, that would be on the developer to figure out.

In this mode, Hibernate Search area of responsibility is simple. It just needs to write documents to the index or alias it is informed of, and when searching data, it just needs to know what index or alias to use for the search query and for parsing the search results, it just needs to know how to map the documents index to the searchable entity. And in terms of ensuring data is in sync, no new indexes or rollovers need to be performed and data would still mostly be available for searching during run-time.
I’ll get a JIRA created for adding alias support for searchable entities and link it here. Seems like we’ve touched on multiple things here so i’ll try not to over scope it.

I do not know how we will be able to map the true index name to the mapped type yet; we’ll have to look for solutions. Maybe there are options for Elasticsearch to return the aliases along with the index name… that would be nice. Otherwise, yes, you will have to provide a pattern.

I just found that in recent versions of Elasticsearch, you can get a list of indexes that an alias points to as long as you have the appropriate permissions. Not sure if this helps. At a minimum, a pattern or at least allowing the framework to call some code, then figuring it out could be up to the developer.

Also thank you for the conversation, I think what Hibernate Search is offering is a very powerful tool that will definitely allow me to provide better features to my users. I’m excited to see where it goes.

yrodiere · September 26, 2019, 9:06am

They are validated on startup. There are plans to expose APIs so that you can trigger validation explicitly, but we’re not there yet.

Right. I understand why a separate index may be needed for writes and reads. I was just pointing out that using one alias for inserts only, and the other for reads and updates seems wrong. I suppose it was just a mistake.

Essentially you’ll need mass indexing if add Hibernate Search to a pre-existing application with an already populated database, or if you add new indexed fields. Elasticsearch rollover is only an option if all the data is already in the index.
Some people also disable automatic indexing and prefer to mass-index every night, for performance reasons, because they have far-reaching @IndexedEmbeddeds that trigger a lot of loading. That’s not always an option depending on your data set, though.

Theoretically, yes. In practice, I doubt peformance would be great. But we can certainly explore this option. Feel free to drop a comment on the JIRA tickets:

[HSEARCH-3499] - Hibernate JIRA for the “alias swapping” strategy.
[HSEARCH-1032] - Hibernate JIRA for the “index scanning” strategy (check for each document whether the corresponding entity still exists)

Thanks!

Not sure either, as aliases can change, so we’d have to get that list before every search query… Not great

Thanks to you! That is very interesting feedback, and we never have enough of this

aaronloes · November 16, 2019, 8:23pm

Apologies for the delay. I’ve created the feature request in JIRA https://hibernate.atlassian.net/browse/HSEARCH-3765

I’ve read the other two JIRA documents and didnt see the need to add any additional comments.

yrodiere · November 18, 2019, 4:07pm

Thank you! I’ve added this to the backlog.

aaronloes · November 7, 2020, 4:04am

@yrodiere

I now remember why I don’t use validate. Because I don’t want Hibernate Search to manage my indexes (I create my indexes and mappings outside of HS), I don’t add all the annotations for analyzers to my entities being that it seems the annotations are only used for creation of the index mapping (Why add the annotations the library uses to managing indexes if you don’t want the library to manage the indexes?) However, if you have validate on, it will fail on startup because there are things in the mappings that don’t have the corresponding annotations in the code.

yrodiere · November 8, 2020, 8:47am

Sure. That being said, I’d expect schema validation to still work, even if you don’t register your analyzers in Hibernate Search.

Schema validation is supposed to let things go when they don’t affect Hibernate Search; for example if you have extra field that Hibernate Search is not aware of, validation will ignore them. If you have extra analyzers that Hibernate Search is not aware of, I’d expect schema validation to ignore them as well.

If validation fails, runtime is likely to also fail. Be it because of missing fields, incompatible field types, missing options (docvalues, …), incompatible formats, …

Can you tell me more about what fails when you enable schema validation exactly?

aaronloes · November 17, 2020, 4:08am

Sure. I’m mocking some of this up but hopefully it’ll demonstrate what I’m seeing. Also should note that my application works as I expect it to with how I have it setup.

Given the following index mapping/settings

{
  "widgets" : {
    "mappings" : {
      "dynamic" : "strict",
      "properties" : {
        "createdDate" : {
          "type" : "date"
        },
        "lastModifiedDate" : {
          "type" : "date"
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "analyzed" : {
              "type" : "text",
              "analyzer" : "english_search"
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "refresh_interval" : "1s",
        "number_of_shards" : "5",
        "provided_name" : "widgets",
        "creation_date" : "1574291638455",
        "analysis" : {
          "filter" : {
            "english_stemmer" : {
              "type" : "stemmer",
              "language" : "english"
            },
            "english_stop" : {
              "type" : "stop",
              "stopwords" : "_english_"
            },
            "english_possessive_stemmer" : {
              "type" : "stemmer",
              "language" : "possessive_english"
            }
          },
          "analyzer" : {
            "english_search" : {
              "filter" : [
                "english_possessive_stemmer",
                "lowercase",
                "asciifolding",
                "english_stop",
                "english_stemmer"
              ],
              "tokenizer" : "standard"
            }
          }
        },
        "number_of_replicas" : "1",
        "uuid" : "2qzMsMAiQyOy4qF-_91U3g",
        "version" : {
          "created" : "7010199"
        }
      }
    }
  }
}

and the following entity

@Data
@EqualsAndHashCode
@Entity
@Indexed(index = "widgets")
public class Widget {

    @Id
	@GeneratedValue(generator = "uuid2")
	@GenericGenerator(name = "uuid2", strategy = "uuid2")
	@Setter(AccessLevel.NONE)
	private UUID id;

	@CreatedDate
	@Column(nullable = false)
    @GenericField
	private Instant createdDate;

	@LastModifiedDate
	@Column(nullable = false)
    @GenericField
	private Instant lastModifiedDate;

	@NotNull
	@Size(min = 1, max = 255)
	@Column(nullable = false, unique = true)
	@FullTextField(analyzer = "english_search")
	private String name;

}

I get a lot of errors about date fields

field 'widget.createdDate':
    attribute 'format':
        failures:
          - The output format (the first element) is invalid. Expected 'uuuu-MM-dd'T'HH:mm:ss.SSSSSSSSSZZZZZ', actual is 'strict_date_optional_time'
          - Invalid formats. Every required formats must be in the list, though it's not required to provide them in the same order, and the list must not contain unexpected formats. Expected '[uuuu-MM-dd'T'HH:mm:ss.SSSSSSSSSZZZZZ]', actual is '[strict_date_optional_time, epoch_millis]', missing elements are '[uuuu-MM-dd'T'HH:mm:ss.SSSSSSSSSZZZZZ]', unexpected elements are '[strict_date_optional_time, epoch_millis]'.

another my english_search analyzer

field 'name': 
    attribute 'analyzer': 
        failures: 
          - Invalid value. Expected 'english_search', actual is 'null'

Analyzer error could be because i’m creating a sub field ‘name.analyzed’ that is actually using the analyzer and not the text field ‘name’ itself.

yrodiere · November 17, 2020, 8:17am

Ok. In that case I’d say the validation errors are actually legitimate. At the very least they warrant some investigation.

Hibernate Search expects date fields to have a certain format, and if you don’t follow that format, you may encounter strange issues:

The output format (the first format in the list) must be exactly the one Hibernate Search expects, or projections will just fail when Hibernate Search attempts to parse the values returned by Elasticsearch.
All formats that Hibernate Search expects must be in the list, otherwise indexing may fail: Hibernate Search will send a date formatted in a certain way, while Elasticsearch will expect a different format. Hibernate Search uses these formats for a reason: the default formats in Elasticsearch are not always adapted to representing the Java date/time type. It may not be visible at first, because problems sometimes arise only with very specific dates. I know there are definitely problems with ZonedDateTime that require a custom format, and I’m sure there are problems with other types as well.

There are plans to allow for custom formats (i.e. formats different from the defaults in Hibernate Search), but that’s not implemented yet: https://hibernate.atlassian.net/browse/HSEARCH-2354

Regarding the name field, I’m afraid your mapping is indeed incorrect from Hibernate Search’s perspective. With the current mapping:

When you query the field name through Hibernate Search APIs, the analyzer english_search will not be used. The standard analyzer will be used instead. But in your Hibernate Search mapping, you indicated you wanted the english_search analyzer to be used on that field!
When you query the field name.analyzed through Hibernate Search APIs, Hibernate Search won’t know about that field and will throw an exception. Support for sub-fields is not there yet: https://hibernate.atlassian.net/browse/HSEARCH-3465

Since the name field is analyzed anyway, I would recommend simply removing field.analyzed and putting the analyzer on name?

aaronloes · December 4, 2020, 2:06am

Regarding the name field; you are correct, the annotation is wrong in this case, I will fix this on my side.

When you query the field name through Hibernate Search APIs

I’m mostly using native only querying, so not sure if this applies.

Since the name field is analyzed anyway, I would recommend simply removing field.analyzed and putting the analyzer on name ?

I cannot do this has I am using both the name and name.analyzed for better results when searching my documents by name.

Regarding the dates; the dates formats the Elasticsearch expects by default are well known.

Date formats can be customised, but if no format is specified then it uses the default:

“strict_date_optional_time||epoch_millis”

If the value is strict_date_optional_time||epoch_millis, then why not default to ISO-8601 formatted strings for persistence and parsing? That is how I’ve done it in the past and we’ve persisted many of the different Java time object structures. Elasticsearch used to use JodaTime under the covers. I think they may have switched over to new standard Java time in newer versions. Even as such, if the value in the mapping is “strict_date_optional_time||epoch_millis” I would think we would know the expected formats. Then again, I’m not aware of the specifics of the frameworks needs.

yrodiere · December 4, 2020, 7:35am

There are all sorts of reasons not to use the default formats. The reasons are different depending on the Java type, and they are generally about edge cases, so I’m not surprised you didn’t have any problem with the default formats.

Let’s take Instant for example; I assume that’s the one you’re using for widget.createdDate:

Instant represents the time part with nanosecond resolution. strict_date_optional_time does not accept microseconds or nanoseconds (or at least it didn’t last time I checked).
The date type in Elasticsearch indexes with millisecond resolution anyway, so that’s not a big deal when it comes to indexing.
However, when retrieving the value from the _source, we have access to the exact string that was sent to the server… so if the server accepts nanosecond resolution, we can effectively store the data with millisecond resolution, and ensure that whenever you retrieve the _source for an Instant field, you don’t have any loss in resolution. If we were using the default format, we wouldn’t be able to do that.

Like I said, that’s an edge case: you personally probably don’t rely on the nanosecond resolution of Instant. People who want it probably should move to date_nanos, but that’s not possible yet in Hibernate Search, and obviously it has downsides too. So we decided to make it work as well as possible by default, for as many people as possible, and that required using a more precise format than strict_date_optional_time.

I’m just explaining the reasoning; of course I can understand how that’s annoying if you want to write the schema yourself. Hopefully this will no longer a problem for you when someone fixes HSEARCH-2354 and you can select different date/time formats (with different downsides).

aaronloes · December 4, 2020, 2:01pm

Thank you. As usual, time processing is complicated. I was curious about the reasoning and this explains it. You are correct in that I likely have not had to deal with nanosecond level permission. At least not as a direct need.

Thank you for your response.

Topic		Replies	Views
Elasticsearch index alias opt-out? Hibernate Search	3	724	November 17, 2020
Read alias pointing to multiple indexes Hibernate Search	1	74	June 28, 2024
Get ES index name for Entity at runtime Hibernate Search	5	1124	June 17, 2021
HSearch Metamodel Hibernate Search	12	873	March 20, 2020
Hibernate-search 6 manually create / drop mapping Hibernate Search	3	726	July 18, 2019

Hibernate Search 6 Index Aliases

Related topics