This is a document for PGroonga 1.X. See PGroonga 2.x document when you're using recent PGroonga.
&@*
operatorSince 1.2.1.
&~?
operator is deprecated since 1.2.2. Use &@*
operator instead.
&@*
operator performs similar search.
column &@* document
column
is a column to be searched. It's text
type, text[]
type or varchar
type.
document
is a document for similar search. It's text
type for text
type or text[]
type column
. It's varchar
type for varchar
type column
.
Similar search searches records that have similar content with document
. If document
is short content, similar search may return records that are less similar.
You need to specify one of the following operator classes to use this operator:
pgroonga.text_full_text_search_ops_v2
: For text
pgroonga.text_array_full_text_search_ops_v2
: For text[]
pgroonga.varchar_full_text_search_ops_v2
: For varchar
Here are sample schema and data for examples:
CREATE TABLE memos (
id integer,
content text
);
CREATE INDEX pgroonga_content_index ON memos
USING pgroonga (content pgroonga.text_full_text_search_ops_v2);
INSERT INTO memos VALUES (1, 'PostgreSQL is a relational database management system.');
INSERT INTO memos VALUES (2, 'Groonga is a fast full text search engine that supports all languages.');
INSERT INTO memos VALUES (3, 'PGroonga is a PostgreSQL extension that uses Groonga as index.');
INSERT INTO memos VALUES (4, 'There is groonga command.');
You can search records that are similar with the specified document by &@*
operator:
SELECT * FROM memos WHERE content &@* 'Mroonga is a MySQL extension taht uses Groonga';
-- id | content
-- ----+----------------------------------------------------------------
-- 3 | PGroonga is a PostgreSQL extension that uses Groonga as index.
-- (1 row)
You can't use similar search with sequential scan. If you use similar search with sequential search, you get the following error:
SELECT * FROM memos WHERE content &@* 'Mroonga is a MySQL extension taht uses Groonga';
-- ERROR: pgroonga: operator &@* is available only in index scan
You should use TokenMecab
tokenizer instead of the default TokenBigram
for similar search against Japanese documents:
CREATE INDEX pgroonga_content_index ON memos
USING pgroonga (content pgroonga.text_full_text_search_ops_v2)
WITH (tokenizer='TokenMecab');
TokenMecab
will tokenize target documents to words. It improves similar search precision.
See also CREATE INDEX USING pgroonga
how to specify TokenMecab
tokenizer.