Highlighter returns field value like there's no analyzer on this field

Hello! I have a problem with a highlighter in Hibernate Search 6.2.4.Final.
Maven dependencies:

        <dependency>
            <groupId>org.hibernate.search</groupId>
            <artifactId>hibernate-search-mapper-orm</artifactId>
            <version>6.2.4.Final</version>
        </dependency>

        <dependency>
            <groupId>org.hibernate.search</groupId>
            <artifactId>hibernate-search-backend-lucene</artifactId>
            <version>6.2.4.Final</version>
        </dependency>

I have an entity class - Product.
For example there’s a next field:

@FullTextField(highlightable = Highlightable.FAST_VECTOR)
private String features;

Also I have an analyzer:

I wrote the path to this analyzer class in the application.properties.
It works without any error, but I notice that my highlighter returns values with UPPERCASE words or HTML tags.

For example I was searching for next word: “checking”. My highlight for field “features” returns:

"features": ["Cloud is is is IS <a></a> tags <em>checking</em>"]

So the question is why there are uppercase letters in my highlight results and html tag .
Shouldn’t my analyzer have deleted it when indexing my entity? I doubt my analyzer is even working…

I use MassIndexer for indexing:

Search.session(em)
    .massIndexer(entityType)
    .idFetchSize(200)
    .batchSizeToLoadObjects(100)
    .startAndWait();

Hi @Boston_Ra

Thanks for reaching out. From the example you’ve shared with us, I assume the original text in the feature field was "Cloud is is is IS <a></a> tags checking", right?
The way highlighter works … is that it returns the original text with a word/words that matched the query “wrapped” with the tags. Hence the "Cloud is is is IS <a></a> tags <em>checking</em>" is what I’d expect you to receive back.

(In other words, analysis is used to help find matches and find which words to highlight, but the actual highlighting happens on the “original text” not transformed)

Yeah you’re right.
So there’s no option but to change the original text?

Well… you can always do some transformations before setting the field value, or if you need to keep the original, you may try to create a getter that does the transformations and annotate the getter with @FullTextField instead of the field.

Btw what about ValueBridge? Seems like it works for me, it doesn’t return html-tags anymore since I clean it by ValueBridge:

@FullTextField(highlightable = Highlightable.FAST_VECTOR,
        valueBridge = @ValueBridgeRef(type = HtmlSanitizedValueBridge.class))
    private String features;

The value bridge itself:

public class HtmlSanitizedValueBridge implements ValueBridge<String, String> {

    @Override
    public String toIndexedValue(String value, ValueBridgeToIndexedValueContext context) {
        if (value == null) {
            return null;
        }

        return Html.sanitize(value);
    }

}

Sure, that works as well. You may want to add the fromIndexValue() implementation:

@Override
public String fromIndexedValue(String value, ValueBridgeFromIndexedValueContext context) {
	return value;
}

So that it is easier to use projections (Hibernate Search 7.1.1.Final: Reference Documentation)