This is a document for PGroonga 2.X. See PGroonga 1.x document when you're using old PGroonga.

CREATE INDEX USING pgroonga

You need to specify USING pgroonga to CREATE INDEX to use PGroonga as index method. This section describes about pgroonga index method.

Syntax

This section describes only pgroonga index method related CREATE INDEX syntax. See CREATE INDEX document by PostgreSQL for full CREATE INDEX syntax.

Here is a basic syntax for creating a single column index:

CREATE INDEX ${INDEX_NAME}
          ON ${TABLE_NAME}
       USING pgroonga (${COLUMN});

This syntax can be used for the following cases:

Here is a basic syntax for creating a full text search index for a varchar type column:

CREATE INDEX ${INDEX_NAME}
          ON ${TABLE_NAME}
       USING pgroonga (${COLUMN} pgroonga_varchar_full_text_search_ops_v2);

You need to specify pgroonga_varchar_full_text_search_ops_v2 operator class for the case.

Customization

You can customize the followings by WITH option of CREATE INDEX:

Normally, you don't need to customize them because the default values of them are suitable for most cases. Features to customize them are for advanced users.

Plugin and token filter aren't used by default.

Here are the default tokenizer, normalizer and lexicon type:

How to register plugins

Since 1.2.0.

Specify plugins='${PLUGIN_NAME_1}, ${PLUGIN_NAME_2}, ..., ${PLUGIN_NAME_N}' for registering plugins.

Note that you must specify plugins as the first option in CREATE INDEX. Options in CREATE INDEX are processed by the specified order. Plugins should be registered before other options are processed because tokenizer, normalizer and token filters may be included in the plugins.

Here is an example to register token_filters/stem plugin to use TokenFilterStem token filter:

CREATE TABLE memos (
  id integer,
  content text
);

CREATE INDEX pgroonga_content_index
          ON memos
       USING pgroonga (content)
        WITH (plugins='token_filters/stem',
              token_filters='TokenFilterStem');

See How to customize token filters for token filters details.

How to customize tokenizer

Specify tokenizer='${TOKENIZER_NAME}' for customizing tokenizer. Normally, you don't need to customize tokenizer.

Here is an example to use MeCab based tokenizer. You need to specify tokenizer='TokenMecab'. TokenMecab is a name of MeCab based tokenizer.

CREATE TABLE memos (
  id integer,
  content text
);

CREATE INDEX pgroonga_content_index
          ON memos
       USING pgroonga (content)
        WITH (tokenizer='TokenMecab');

You can disable tokenizer by specifying tokenizer=''. If you disable tokenizer, you can search column value only by exact match search and prefix search. It reduces noise for some cases. For example, it's useful for tag search, name search and so on.

Here is an example to disable tokenizer:

CREATE TABLE memos (
  id integer,
  tag text
);

CREATE INDEX pgroonga_tag_index
          ON memos
       USING pgroonga (tag)
        WITH (tokenizer='');

tokenizer='TokenDelimit' will be useful for tag search. See also TokenDelimit.

You can specify tokenizer options by tokenizer='${TOKENIZER_NAME}(...)' syntax.

It's available since 2.0.6.

Here is an example to use TokenNgram tokenizer with "n" and 3 options:

CREATE TABLE memos (
  id integer,
  tag text
);

CREATE INDEX pgroonga_tag_index
          ON memos
       USING pgroonga (tag)
        WITH (tokenizer='TokenNgram("n", 3)');

See Tokenizers for other tokenizers.

How to customize normalizer

Specify normalizer='${NORMALIZER_NAME}' for customizing normalizer. Normally, you don't need to custom normalizer.

You can disable normalizer by specifying normalizer=''. If you disable normalizer, you can search column value only by the original column value. If normalizer increases noise, it's useful.

Here is an example to disable normalizer:

CREATE TABLE memos (
  id integer,
  tag text
);

CREATE INDEX pgroonga_tag_index
          ON memos
       USING pgroonga (tag)
        WITH (normalizer='');

You can specify normalizer options by normalizer='${NORMALIZER_NAME}(...)' syntax.

It's available since 2.0.6.

Here is an example to use NormalizerNFKC100 normalizer with "unify_kana" and true options:

CREATE TABLE memos (
  id integer,
  tag text
);

CREATE INDEX pgroonga_tag_index
          ON memos
       USING pgroonga (tag)
        WITH (normalizer='NormalizerNFKC100("unify_kana", true)');

See Normalizers for other normalizers.

You can use other custom normalizer for full text search, regular expression search and prefix search separately. Here are sample operator classes for them:

You can use different normalizer for each search operations by the following parameters.

If they aren't used, the normalizer parameter is used as fallback.

Here is an example to disable normalizer only for full text search:

CREATE TABLE memos (
  id integer,
  title text,
  content text,
  tag text
);

CREATE INDEX pgroonga_tag_index
          ON memos
       USING pgroonga (
               title pgroonga_full_text_search_ops_v2,
               content pgroonga_regexp_ops_v2,
               tag pgroonga_term_search_ops_v2
             )
        WITH (full_text_search_normalizer='',
              normalizer='NormalizerAuto');

The index for title is for full text search. It doesn't use normalizer because full_text_search_normalizer is ''. Other indexes use NormalizerAuto because normalizer is 'NormalizerAuto'.

Here is an example to disable normalizer only for regular expression search:

CREATE TABLE memos (
  id integer,
  title text,
  content text,
  tag text
);

CREATE INDEX pgroonga_tag_index
          ON memos
       USING pgroonga (
               title pgroonga_full_text_search_ops_v2,
               content pgroonga_regexp_ops_v2,
               tag pgroonga_term_search_ops_v2
             )
        WITH (regexp_search_normalizer='',
              normalizer='NormalizerAuto');

The index for content is for regular expression search. It doesn't use normalizer because regexp_search_normalizer is ''. Other indexes use NormalizerAuto because normalizer is 'NormalizerAuto'.

Here is an example to disable normalizer only for prefix search:

CREATE TABLE memos (
  id integer,
  title text,
  content text,
  tag text
);

CREATE INDEX pgroonga_tag_index
          ON memos
       USING pgroonga (
               title pgroonga_full_text_search_ops_v2,
               content pgroonga_regexp_ops_v2,
               tag pgroonga_term_search_ops_v2
             )
        WITH (prefix_search_normalizer='',
              normalizer='NormalizerAuto');

The index for tag is for term search that includes prefix search. It doesn't use normalizer because prefix_search_normalizer is ''. Other indexes use NormalizerAuto because normalizer is 'NormalizerAuto'.

How to use token filters

Since 1.2.0.

Specify token_filters='${TOKEN_FILTER_1}, ${TOKEN_FILTER_2}, ..., ${TOKEN_FILTER_N}' for using token filters.

Groonga doesn't provide any token filters by default. All token filters are provided as plugins. You need to register plugins to use token filters.

Here is an example to use TokenFilterStem token filter that is included in token_filters/stem plugin:

CREATE TABLE memos (
  id integer,
  content text
);

CREATE INDEX pgroonga_content_index
          ON memos
       USING pgroonga (content)
        WITH (plugins='token_filters/stem',
              token_filters='TokenFilterStem');

Note that you must specify plugins before token_filters. These CREATE INDEX options are processed by the specified order. Plugins must be registered before you use token filters.

See Token filters for other token filters.

How to change tablespace

Since 1.1.6.

Specify TABLESPACE ${TABLESPACE_NAME} for customizing tablespace. If you have fast storage, you may want to change tablespace for PGroonga indexes.

Here is an example to change tablespace:

CREATE TABLESPACE fast LOCATION '/data/fast_disk';

CREATE TABLE memos (
  id integer,
  tag text
);

CREATE INDEX pgroonga_tag_index
          ON memos
       USING pgroonga (tag)
  TABLESPACE fast;

How to change lexicon type

Since 2.0.6.

Specify lexicon_type='${LEXICON_TYPE}' for changing lexicon type.

Here are available lexicon types:

Normally, you don't need to customize this because the default value is suitable for most cases.

Here is an example to use hash_table lexicon type to disable predictive token search:

CREATE TABLE memos (
  id integer,
  content text
);

CREATE INDEX pgroonga_content_index
          ON memos
       USING pgroonga (content)
        WITH (lexicon_type='hash_table');