This is a document for PGroonga 1.X. See PGroonga 2.x document when you're using recent PGroonga.
CREATE INDEX USING pgroonga
You need to specify USING pgroonga
to CREATE INDEX
to use PGroonga as index method. This section describes about pgroonga
index method.
This section describes only pgroonga
index method related CREATE INDEX
syntax. See CREATE INDEX
document by PostgreSQL for full CREATE INDEX
syntax.
Here is a basic syntax for creating a single column index:
CREATE INDEX ${INDEX_NAME}
ON ${TABLE_NAME}
USING pgroonga (${COLUMN});
This syntax can be used for the following cases:
text
type columntext
type columntext
type columnstext
type columnsjsonb
type columnHere is a basic syntax for creating a full text search index for a varchar
type column:
CREATE INDEX ${INDEX_NAME}
ON ${TABLE_NAME}
USING pgroonga (${COLUMN} pgroonga.varchar_full_text_search_ops);
You need to specify pgroonga.varchar_full_text_search_ops
operator class for the case.
You can customize the followings by WITH
option of CREATE INDEX
:
Plugin: It's an extension for Groonga. You can use additional features including additional tokenizers, normalizers and token filters by registering a plugin.
Tokenizer: It's a module for customizing how to extract keywords.
Normalizer: It's a module for customizing equality of text
and varchar
types.
Token filter: It's a module for filtering keywords extracted by tokenizer.
Normally, you don't need to customize them because the default values of them are suitable for most cases. Features to custom them are for advanced users.
Plugin and token filter aren't used by default.
Here are the default tokenizer and normalizer:
Tokenizer: TokenBigram
: It's a bigram based tokenizer. It combines bigram tokenization and white space based tokenization. It uses bigram tokenization for non ASCII characters and white space based tokenization for ASCII characters. It reduces noise for ASCII characters only query.
Normalizer: NormalizerAuto
: It chooses suitable normalization based on target encoding. For example, it uses Unicode NFKC based normalization for UTF-8.
Since 1.2.0.
Specify plugins='${PLUGIN_NAME_1}, ${PLUGIN_NAME_2}, ..., ${PLUGIN_NAME_N}'
for registering plugins.
Note that you must specify plugins
as the first option in CREATE INDEX
. Options in CREATE INDEX
are processed by the specified order. Plugins should be registered before other options are processed because tokenizer, normalizer and token filters may be included in the plugins.
Here is an example to register token_filters/stem
plugin to use TokenFilterStem
token filter:
CREATE TABLE memos (
id integer,
content text
);
CREATE INDEX pgroonga_content_index
ON memos
USING pgroonga (content)
WITH (plugins='token_filters/stem',
token_filters='TokenFilterStem');
See How to customize token filters for token filters details.
Specify tokenizer='${TOKENIZER_NAME}'
for customizing tokenizer. Normally, you don't need to custom tokenizer.
Here is an example to use MeCab based tokenizer. You need to specify tokenizer='TokenMecab'
. TokenMecab
is a name of MeCab based tokenizer.
CREATE TABLE memos (
id integer,
content text
);
CREATE INDEX pgroonga_content_index
ON memos
USING pgroonga (content)
WITH (tokenizer='TokenMecab');
You can disable tokenizer by specifying tokenizer=''
. If you disable tokenizer, you can search column value only by exact match search and prefix search. It reduces noise for some cases. For example, it's useful for tag search, name search and so on.
Here is an example to disable tokenizer:
CREATE TABLE memos (
id integer,
tag text
);
CREATE INDEX pgroonga_tag_index
ON memos
USING pgroonga (tag)
WITH (tokenizer='');
tokenizer='TokenDelimit'
will be useful for tag search. See also TokenDelimit
.
See Tokenizers for other tokenizers.
Specify normalizer='${NORMALIZER_NAME}'
for customizing normalizer. Normally, you don't need to customize normalizer.
You can disable normalizer by specifying normalizer=''
. If you disable normalizer, you can search column value only by the original column value. If normalizer increases noise, it's useful.
Here is an example to disable normalizer:
CREATE TABLE memos (
id integer,
tag text
);
CREATE INDEX pgroonga_tag_index
ON memos
USING pgroonga (tag)
WITH (normalizer='');
See Normalizers for other normalizers.
Since 1.2.0.
Specify token_filters='${TOKEN_FILTER_1}, ${TOKEN_FILTER_2}, ..., ${TOKEN_FILTER_N}'
for using token filters.
Groonga doesn't provide any token filters by default. All token filters are provided as plugins. You need to register plugins to use token filters.
Here is an example to use TokenFilterStem
token filter that is included in token_filters/stem
plugin:
CREATE TABLE memos (
id integer,
content text
);
CREATE INDEX pgroonga_content_index
ON memos
USING pgroonga (content)
WITH (plugins='token_filters/stem',
token_filters='TokenFilterStem');
Note that you must specify plugins
before token_filters
. These CREATE INDEX
options are processed by the specified order. Plugins must be registered before you use token filters.
See Token filters for other token filters.
Since 1.1.6.
Specify TABLESPACE ${TABLESPACE_NAME}
for customizing tablespace. If you have fast storage, you may want to change tablespace for PGroonga indexes.
Here is an example to change tablespace:
CREATE TABLESPACE fast LOCATION '/data/fast_disk';
CREATE TABLE memos (
id integer,
tag text
);
CREATE INDEX pgroonga_tag_index
ON memos
USING pgroonga (tag)
TABLESPACE fast;