Non latin symbols

Generally

Most databases nowadays store data in UTF format. In this case you can use these symbols directly in your query.

Example:

SELECT 

    sa_name AS 中國條約

FROM 

    folio_agreements.subscription_agreement sa 

WHERE 

    sa_name = '中國條約'

In order to be able to save this query, the file must be saved in UTF format, otherwise the symbols will not be encoded correctly.

Useful / advanced functions

In order to query the different symbols better, combinations of PSQL functions can be used. The functions can be found in the official documentation from postgres.

Here is a small selection:

The trick is to convert the symbols into a uniform system and thus enable the basis for a better quering. Here it is advisable to convert the symbols into a HEX value. Each symbol has a HEX value. PSQL offers functions for this. PSQL offers the simplest option with the data type bytea.

sa_name::bytea

If you look at the attribute value in Dbeaver, you get the following view. A hexviewer.

The address area is on the left, the hexadecimal area in the middle and the character area on the right. If a symbol is marked in the Hexviewer, the corresponding value can be read off.

Examples

1. Replace the space with another symbol. Space = 20 , the other symbol = E4B8AD (Tables: https://www.utf8-chartable.de/unicode-utf8-table.pl)

convert_from(decode(encode(concat('\x', replace(encode('A B', 'hex'), '20', 'E4B8AD'))::bytea, 'escape'), 'escape'), 'UTF-8')::varchar

2. Find all agreements that have non-Latin symbols in their names

SELECT * FROM 

(

SELECT 

    sa_name, 

    /*

     * 1. delete all latin script symbols with the functions translate() and replace()

     * 2. look at the HEX value at the beginning. If they start with a letter, then you have a title with non-latin symbols

     */

    length(REPLACE(REPLACE(REPLACE(regexp_matches(encode(REPLACE(TRANSLATE(sa_name, 'abcdefghijklmnopqrstuvwxyzäöü+()- ,#+[]ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜß', '@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@'), '@', '')::bytea, 'hex'), '[a-f]*')::varchar, '{', ''), '}', ''), '"', '')) AS len

FROM 

    folio_agreements.subscription_agreement sa

) AS t

WHERE t.len > 0