Non latin symbols
Generally
Most databases nowadays store data in UTF format. In this case you can use these symbols directly in your query.
Example:
SELECT sa_name AS 中國條約 FROM folio_agreements.subscription_agreement sa WHERE sa_name = '中國條約' |
---|
In order to be able to save this query, the file must be saved in UTF format, otherwise the symbols will not be encoded correctly.
Useful / advanced functions
In order to query the different symbols better, combinations of PSQL functions can be used. The functions can be found in the official documentation from postgres.
Here is a small selection:
- https://www.postgresql.org/docs/current/functions-binarystring.html
- https://www.postgresql.org/docs/9.5/datatype-binary.html
The trick is to convert the symbols into a uniform system and thus enable the basis for a better quering. Here it is advisable to convert the symbols into a HEX value. Each symbol has a HEX value. PSQL offers functions for this. PSQL offers the simplest option with the data type bytea.
sa_name::bytea |
---|
If you look at the attribute value in Dbeaver, you get the following view. A hexviewer.
The address area is on the left, the hexadecimal area in the middle and the character area on the right. If a symbol is marked in the Hexviewer, the corresponding value can be read off.
Examples
1. Replace the space with another symbol. Space = 20 , the other symbol = E4B8AD (Tables: https://www.utf8-chartable.de/unicode-utf8-table.pl)
convert_from(decode(encode(concat('\x', replace(encode('A B', 'hex'), '20', 'E4B8AD'))::bytea, 'escape'), 'escape'), 'UTF-8')::varchar |
---|
2. Find all agreements that have non-Latin symbols in their names
SELECT * FROM ( SELECT sa_name, /* * 1. delete all latin script symbols with the functions translate() and replace() * 2. look at the HEX value at the beginning. If they start with a letter, then you have a title with non-latin symbols */ length(REPLACE(REPLACE(REPLACE(regexp_matches(encode(REPLACE(TRANSLATE(sa_name, 'abcdefghijklmnopqrstuvwxyzäöü+()- ,#+[]ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜß', '@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@'), '@', '')::bytea, 'hex'), '[a-f]*')::varchar, '{', ''), '}', ''), '"', '')) AS len FROM folio_agreements.subscription_agreement sa ) AS t WHERE t.len > 0 |
---|