This is a document for PGroonga 1.X. See PGroonga 2.x document when you're using recent PGroonga.
@~
operatorThis operator is deprecated since 1.2.1. Use &~
operator instead.
@~
operator performs regular expression search.
PostgreSQL provides the following built-in regular expression operators:
SIMILAR TO
is based on SQL standard. "POSIX Regular Expression" is based on POSIX. They use different regular expression syntax.
PGroonga's @~
operator uses another regular expression syntax. @~
uses syntax that is used in Ruby. Because PGroonga uses the same regular expression engine that is used in Ruby. It's Onigmo. See Onigmo document for full syntax definition.
PGroonga's @~
operator normalizes target text before matching. It's similar to ~*
operator in "POSIX Regular Expression". It performs case insensitive match.
Normalization is different from case insensitive. Normally, normalization is more powerful.
Example1: All of "A
", "a
", "A
" (U+FF21 FULLWIDTH LATIN CAPITAL LETTER A), "a
" (U+FF41 FULLWIDTH LATIN SMALL LETTER A) are normalized to "a
".
Example2: Both of full-width Katakana and half-width Katakana are normalized to full-width Katakana. For example, both of "ア
" (U+30A2 KATAKANA LETTER A) and "ア
" (U+FF71 HALFWIDTH KATAKANA LETTER A) are normalized to "ア
" (U+30A2 KATAKANA LETTER A).
Note that @~
operator doesn't normalize regular expression pattern. It only normalizes target text. It means that you must use normalized characters in regular expression pattern.
For example, you must not use "Groonga
" as pattern. You must use "groonga
" as pattern. Because "G
" in target text is normalized to "g
". "Groonga
" is never appeared in target text.
Some simple regular expression patterns can be searched by index in Groonga. If index is used, the search is very fast. See Groonga's regular expression document for index searchable patterns.
If a regular expression pattern can't be searchable by index, it's searched by sequential scan in Groonga.
Note that Groonga may search with regular expression pattern by sequential scan even when EXPLAIN
reports PostgreSQL uses PGroonga index.
column @~ regular_expression
column
is a column to be searched. It's text
type or varchar
type.
regular_expression
is a regular expression to be used as pattern. It's text
type for text
type column
. It's varchar
type for varchar
type column.
If column
value is matched against regular_expression
pattern, the expression returns true
.
You need to specify one of the following operator classes to use this operator:
pgroonga.text_regexp_ops
: For text
pgroonga.varchar_regexp_ops
: For varchar
pgroonga.text_regexp_ops_v2
: For text
pgroonga.varchar_regexp_ops_v2
: For varchar
Here are sample schema for examples:
CREATE TABLE memos (
id integer,
content text
);
CREATE INDEX pgroonga_content_index ON memos
USING pgroonga (content pgroonga.text_regexp_ops);
Here are data for examples:
INSERT INTO memos VALUES (1, 'PostgreSQL is a relational database management system.');
INSERT INTO memos VALUES (2, 'Groonga is a fast full text search engine that supports all languages.');
INSERT INTO memos VALUES (3, 'PGroonga is a PostgreSQL extension that uses Groonga as index.');
INSERT INTO memos VALUES (4, 'There is groonga command.');
You can perform regular expression search by @~
operator:
SELECT * FROM memos WHERE content @~ '\Apostgresql';
-- id | content
-- ----+--------------------------------------------------------
-- 1 | PostgreSQL is a relational database management system.
-- (1 row)
"\A
" in "\Apostgresql
" is a special notation in Ruby regular expression syntax. It means that the beginning of text. The pattern means that "postgresql
" must be appeared in the beginning of text.
Why is "PostgreSQL is a ...
" record matched? Remember that @~
operator normalizes target text before matching. It means that "PostgreSQL is a ...
" text is normalized to "postgresql is a ...
" text before matching. The normalized text is started with "postgresql
". So "\Apostgresql
" regular expression matches to the record.
"PGroonga is a PostgreSQL ...
" record isn't matched. It includes "postgresql
" in normalized text but "postgresql
" isn't appeared at the beginning of text. So it's not matched.
&~
operator: Search by a regular expression