This is a document for PGroonga 2.X and 3.X. See PGroonga 1.x document when you're using old PGroonga.

&~| operator

Since 2.2.1.

Summary

&~| operator performs regular expression search by an array of regular expressions. If one or more regular expressions are matched, the record is matched.

PostgreSQL provides the following built-in regular expression operators:

SIMILAR TO is based on SQL standard. "POSIX Regular Expression" is based on POSIX. They use different regular expression syntax.

This operator uses another regular expression syntax. This operator uses syntax that is used in Ruby. Because PGroonga uses the same regular expression engine that is used in Ruby. It's Onigmo. See Onigmo document for full syntax definition.

This operator normalizes target text before matching. It's similar to ~* operator in "POSIX Regular Expression". It performs case insensitive match.

Normalization is different from case insensitive. Normally, normalization is more powerful.

Example1: All of "A", "a", "" (U+FF21 FULLWIDTH LATIN CAPITAL LETTER A), "" (U+FF41 FULLWIDTH LATIN SMALL LETTER A) are normalized to "a".

Example2: Both of full-width Katakana and half-width Katakana are normalized to full-width Katakana. For example, both of "" (U+30A2 KATAKANA LETTER A) and "" (U+FF71 HALFWIDTH KATAKANA LETTER A) are normalized to "" (U+30A2 KATAKANA LETTER A).

Note that this operator doesn't normalize regular expression pattern. It only normalizes target text. It means that you must use normalized characters in regular expression pattern.

For example, you must not use "Groonga" as pattern. You must use "groonga" as pattern. Because "G" in target text is normalized to "g". "Groonga" is never appeared in target text.

Some simple regular expression patterns can be searched by index in Groonga. If index is used, the search is very fast. See Groonga's regular expression document for index searchable patterns.

If a regular expression pattern can't be searchable by index, it's searched by sequential scan in Groonga.

Note that Groonga may search with regular expression pattern by sequential scan even when EXPLAIN reports PostgreSQL uses PGroonga index.

Syntax

column &~| regular_expressions

column is a column to be searched. It's text type or varchar type.

regular_expressions is an array of regular expressions to be used as pattern. It's text[] type for text type column. It's varchar[] type for varchar type column.

If column value is matched against one or more patterns in regular_expressions, the expression returns true.

Operator classes

You need to specify one of the following operator classes to use this operator:

Usage

Here are sample schema for examples:

CREATE TABLE memos (
  id integer,
  content text
);

CREATE INDEX pgroonga_content_index ON memos
  USING pgroonga (content pgroonga_text_regexp_ops_v2);

Here are data for examples:

INSERT INTO memos VALUES (1, 'PostgreSQL is a relational database management system');
INSERT INTO memos VALUES (2, 'Groonga is a fast full text search engine that supports all languages');
INSERT INTO memos VALUES (3, 'PGroonga is a PostgreSQL extension that uses Groonga as index');
INSERT INTO memos VALUES (4, '*.pgn* files are used by PGroonga');

You can perform regular expression search with patterns by &~| operator:

SELECT * FROM memos WHERE content &~| ARRAY['\Apostgresql', 'pgroonga\z'];
--  id |                        content                        
-- ----+-------------------------------------------------------
--   1 | PostgreSQL is a relational database management system
--   4 | *.pgn* files are used by PGroonga
-- (2 rows)

"\A" in "\Apostgresql" is a special notation in Ruby regular expression syntax. It means that the beginning of text. The pattern means that "postgresql" must be appeared in the beginning of text.

"\z" in "pgroonga\z" is a special notation in Ruby regular expression syntax. It means that the ending of text. The pattern means that "pgroonga" must be appeared in the ending of text.

Why is "PostgreSQL is a ..." record matched? Remember that this operator normalizes target text before matching. It means that "PostgreSQL is a ..." text is normalized to "postgresql is a ..." text before matching. The normalized text is started with "postgresql". So "\Apostgresql" regular expression matches to the record.

"PGroonga is a PostgreSQL ..." record isn't matched. It includes "postgresql" in normalized text but "postgresql" isn't appeared at the beginning of text. So it's not matched.

Why is "... used by PGroonga" record matched? Remember that this operator normalizes target text before matching. It means that "... used by PGroonga" text is normalized to "... used by pgroonga" text before matching. The normalized text is ended with "pgroonga". So "pgroonga\z" regular expression matches to the record.

"PGroonga is a PostgreSQL ..." record isn't matched. It includes "pgroonga" in normalized text but "pgroonga" isn't appeared at the ending of text. So it's not matched.

See also