Hi,
We are using Firebird database, and some of the columns for text are blob subtype binary
. Now we have a problem with how to say this to Hibernate. I tried:
@Lob
@Column(length=Integer.MAX_VALUE-1)
private String note;
This code throws Schema-validation: wrong column type encountered in column [note] in table [Entity]; found [blob sub_type binary (Types#LONGVARBINARY)], but expecting [blob sub_type text (Types#CLOB)]
Then I tried:
@Lob
@Column(length=Integer.MAX_VALUE-1)
@JdbcTypeCode(Types.LONGVARBINARY)
private String note;
This throws HibernateException : Could not convert '[B' to 'java.lang.String' using 'org.hibernate.type.descriptor.java.StringJavaType' to wrap
.
Finally this:
@Lob
@Column(length=Integer.MAX_VALUE-1, columnDefinition = "blob sub_type binary")
private String note;
This solution works, but there is a problem with unit tests when we use an HSQL database; the HSQL database does not have this type of column.
What is the correct solution to handle this?
Currently, we are using Hibernate 6.4.9.Final.
First of all, Firebird is not officially supported, so the Hibernate team might not be able to help you in this case. Try reaching out to the maintainer of the dialect if nothing else helps.
AFAICT, your original mapping is fine, but the Dialect
doesn’t implement #equivalentTypes
properly.
Thanks for your reply and advice. I added to the Firebird dialect:
@Override
public boolean equivalentTypes(int typeCode1, int typeCode2) {
if(isStringType(typeCode1) && isStringType(typeCode2))
return true;
return super.equivalentTypes(typeCode1, typeCode2);
}
private boolean isStringType(int typeCode){
return Types.LONGVARBINARY == typeCode || Types.CLOB == typeCode;
}
And now mapping works with the original (first) code.
If you think that this is a good change, you can discuss with e.g. @mrotteveel and propose a PR.
The correct solution is to declare these columns as BLOB SUB_TYPE TEXT
(a.k.a BLOB SUB_TYPE 1
), not as BLOB SUB_TYPE BINARY
(a.k.a. BLOB SUB_TYPE 0
or just BLOB
).
I don’t see how making the dialect consider BLOB SUB_TYPE BINARY
a string type is a good idea, because it’s not actually a string type.
In any case, I’ll take a look at equivalentTypes
and see if it needs tweaking (I don’t have time to check the semantics right now, so maybe my previous comment is a bit too firm). I’ve created [Hibernate dialect] Investigate type mapping/equivalentTypes · Issue #872 · FirebirdSQL/jaybird · GitHub in Jaybird’s tracker as a reminder to myself.
@beikov I’ve taken a closer look at equivalentTypes
, but I don’t think equating a binary type (blob sub_type binary
) with a string type (blob sub_type text
) in the standard Firebird dialect implementation is a good idea.
For this specific case, a custom dialect will work, but ultimately it comes down to using the wrong data types, and fixing that would be the better solution.
@mrotteveel now is the question, how to fix it? Will be sufficient to just add a new blob text column, transfer data from the binary blob, and remove the old column? Will there not be a problem with encoding after that?
If you define the column with the right encoding (instead of now implicitly assuming one), that will work. Assigning the value of a blob sub_type binary to a blob sub_type text will create a new blob that is a text blob.
If the bytes of the blob sub_type binary are not valid in the target character set of the blob sub_type text (assuming it’s not NONE or OCTETS) would fail on assignment. If the bytes are valid characters, but in the wrong character set, you’ll get logical corruption (“mojibake”) if you don’t use an explicit intermediate cast.
Ok, but what encoding is used now? Is it now stored with a charset connection to DB?
I have not worked with charsets much so far. What happens when my charset connection is, e.g., Latin1 and column charset UTF-8?
If you use BLOB SUB_TYPE BINARY
the string will have been converted to bytes using the Java equivalent of the connection character set, or if the connection character set is NONE, the JVM default character set. On read it does the reverse. So, if all your connections use the same connection character set, use that as the character set of the new column (or of the intermediate cast). If your connections used a variety of connection character sets, you’re basically screwed; or at least, you either produce mojibake, or you have to convert each individual blob, guessing at the right character set for the intermediate cast case-by-case.