How to implement auto complete feature

Auto complete is useful feature for easy to use search box. PGroonga has features to implement auto complete.

You can implement auto complete by combining the following searches:

Sample schema and indexes

Here is the sample schema:

CREATE TABLE terms (
    term text,
    readings text[]
);

Auto complete candidate terms are stored into term. Readings of term are stored in readings. As you know, type of readings is text[], multiple readings are stored into readings.

Here is the sample index definition:

CREATE INDEX pgroonga_terms_prefix_search ON terms USING pgroonga
  (term pgroonga.text_term_search_ops_v2,
   readings pgroonga.text_array_term_search_ops_v2);

CREATE INDEX pgroonga_terms_full_text_search ON terms USING pgroonga
  (term pgroonga.text_full_text_search_ops_v2)
  WITH (tokenizer = 'TokenBigramSplitSymbolAlphaDigit');

The above indexes are required for prefix search and full text search.

TokenBigramSplitSymbolAlphaDigit tokenizer is suitable for loose full text search.

There is a simple way to implement auto complete feature. It is prefix search.

PGroonga provides operator for it: &^ operator

Here is the sample data for prefix search:

INSERT INTO terms (term) VALUES ('auto-complete');

Then, use &^ against term for prefix search. Here is the result of it:

SELECT term FROM terms WHERE term &^ 'auto';
--      term      
-- ---------------
--  auto-complete
-- (1 rows)

The result contains auto-complete as auto complete candidate term.

Only for Japanese: Prefix RK search for auto complete by readings

Prefix RK search is a prefix search variant. It supports searching katakana by romaji, hiragana or katakana. It's useful for Japanese.

Here is the sample data for prefix RK search:

INSERT INTO terms (term, readings) VALUES ('牛乳', ARRAY['ギュウニュウ', 'ミルク']);

Note that you need insert only katakana in readings. This is required to search auto complete candidate terms with prefix RK search.

Then use &^~ operator against readings for prefix RK search. Here are some examples about prefix RK search.

You can search "牛乳" as auto complete candidate of "gyu" (romaji) by prefix RK search:

SELECT term FROM terms WHERE readings &^~ 'gyu';
--  term 
-- ------
--  牛乳
-- (1 row)

You can search "牛乳" as auto complete candidate of ぎゅう" (hiragana) by prefix RK search:

SELECT term FROM terms WHERE readings &^~ 'ぎゅう';
--  term 
-- ------
--  牛乳
-- (1 row)

You can search "牛乳" as auto complete candidate of "ギュウ" (katanaka) by prefix RK search.

SELECT term FROM terms WHERE readings &^~ 'ギュウ';
--  term 
-- ------
--  牛乳
-- (1 row)

There is an advanced usage of readings. If reading of synonym is stored in readings, you can also search as auto complete candidate term:

SELECT term FROM terms WHERE readings &^~ 'mi';
--  term 
-- ------
--  牛乳
-- (1 row)

"ミルク" is a synonym of "牛乳". You can search "牛乳" by "mi" as auto complete candidate term because "ミルク" is stored in readings column.

Use &@ against term for loose full text search. Here is the result of it:

SELECT term FROM terms WHERE term &@ 'mpl';
--      term      
-- ---------------
--  auto-complete
-- (1 rows)

The result contains auto-complete as auto complete candidate term.