Unicode Considerations for Databases
Database schema in Keycloak only accounts for Unicode strings in the following special fields:
-
Realms: display name, HTML display name
-
Federation Providers: display name
-
Users: username, given name, last name, attribute names and values
-
Groups: name, attribute names and values
-
Roles: name
-
Descriptions of objects
Otherwise, characters are limited to those contained in database encoding which is often 8-bit. However, for some database systems, it is possible to enable UTF-8 encoding of Unicode characters and use full Unicode character set in all text fields. Often, this is counterbalanced by shorter maximum length of the strings than in case of 8-bit encodings.
Some of the databases require special settings to database and/or JDBC driver to be able to handle Unicode characters. Please find the settings for your database below. Note that if a database is listed here, it can still work properly provided it handles UTF-8 encoding properly both on the level of database and JDBC driver.
Technically, the key criterion for Unicode support for all fields is whether the database allows setting of Unicode
character set for VARCHAR
and CHAR
fields. If yes, there is a high chance that Unicode will be plausible, usually at
the expense of field length. If it only supports Unicode in NVARCHAR
and NCHAR
fields, Unicode support for all text
fields is unlikely as Keycloak schema uses VARCHAR
and CHAR
fields extensively.
Oracle Database
Unicode characters are properly handled provided the database was created with Unicode support in VARCHAR
and CHAR
fields (e.g. by using AL32UTF8
character set as the database character set). No special settings is needed for JDBC
driver.
If the database character set is not Unicode, then to use Unicode characters in the special fields, the JDBC driver needs
to be configured with the connection property oracle.jdbc.defaultNChar
set to true
. It might be wise, though not
strictly necessary, to also set the oracle.jdbc.convertNcharLiterals
connection property to true
. These properties
can be set either as system properties or as connection properties. Please note that setting oracle.jdbc.defaultNChar
may have negative impact on performance. For details, please refer to Oracle JDBC driver configuration documentation.
Microsoft SQL Server Database
Unicode characters are properly handled only for the special fields. No special settings of JDBC driver or database is necessary.
IBM DB2 Database
Unicode characters are properly handled for all fields, length reduction applies to non-special fields. No special settings of JDBC driver or database is necessary.
MySQL Database
Unicode characters are properly handled provided the database was created with Unicode support in VARCHAR
and CHAR
fields in the CREATE DATABASE
command (e.g. by using utf8
character set as the default database character set in
MySQL 5.5. Please note that utf8mb4
character set does not work due to different storage requirements to utf8
character set [1]). Note that in this case, length
restriction to non-special fields does not apply because columns are created to accomodate given amount of characters,
not bytes. If the database default character set does not allow storing Unicode, only the special fields allow storing
Unicode values.
At the side of JDBC driver settings, it is necessary to add a connection property characterEncoding=UTF-8
to the JDBC
connection settings.
PostgreSQL Database
Unicode is supported when the database character set is UTF8
. In that case, Unicode characters can be used in any
field, there is no reduction of field length for non-special fields. No special settings of JDBC driver is necessary.