This is a document for PGroonga 2.X and 3.X. See PGroonga 1.x document when you're using old PGroonga.
pgroonga_condition()
functionSince 3.1.6.
pgroonga_condition()
function returns pgroonga_condition
type value.
The function and the type have same name, but they are two different things.
pgroonga_condition
type represents complicated conditional expressions, such as pgroonga_full_text_search_condition
type and pgroonga_full_text_search_condition_with_scorers
type.
pgroonga_condition()
function is a useful function to make the pgroonga_condition
type value.
It allows to make the pgroonga_condition
type value by designating the specific attribute values.
There were not this kind of useful functions for pgroonga_full_text_search_condition
type and pgroonga_full_text_search_condition_with_scorers
type, so designating all attribute values was necessary to make the value.
Therefore, you need to designate NULL
for disused attribute value as follows when pgroonga_full_text_search_condition
type and pgroonga_full_text_search_condition_with_scorers
type are used to avoid making all the values.
title &@~ ('keyword', NULL, 'index_name')::pgroonga_full_text_search_condition
title &@~ ('keyword', ARRAY[1,1,1,5,0], NULL, 'index_name')::pgroonga_full_text_search_condition_with_scorers
It was not possible for existing value creation methods to make new attribute value while keeping backward compatibility.
Thus, it was necessary to add a new type every time when a new attribute value is added, such as pgroonga_full_text_search_condition_with_XXX
type.
For example, pgroogna_full_text_search_condition_with_scorers
type was added because of the added new attribute.
The difference between pgroonga_full_text_search_condition
type and pgroonga_full_text_search_condition_with_scorers
type is whether scorers
exist or not. If scorers
is added to pgroonga_full_text_search_condition
type, every users are required to insert new NULL
to make pgroonga_full_text_search_condition
type regardless of scorers
usage.
However, installing pgroonga_condition()
function to make new pgroonga_condition
type value let a new attribute to be added while keeping backward compatibility.
It is because pgroonga_condition()
function absorbs incompatibility.
pgroonga_condition()
function lets current writing style when a new attribute value is added because the function can leave out unnecessary attribute value as following sample.
(In the following sample, weights
, scorers
, schema_name
and column_name
are left out. The details of attribute values would be noted in next "Syntax". Here, point is that possibility of leaving out unnecessary attribute values.)
title &@~ pgroonga_condition('keyword', index_name => 'index_name')
Please note that while using pgroonga_condition()
function you can leave out attribute values instead you need to describe comment like keyword argument such as index_name => 'index name'
.
In the above sample, there are mix of attribute values which is like keyword argument or not. How to separate writing is going to be explained in next "Syntax". The point here is there is need of different writing from the current.
Here is the syntax of this function:
pgroonga_condition pgroonga_condition(keyword,
weights,
scorers,
schema_name,
index_name,
column_name,
fuzzy_max_distance_ratio)
keyword
is a keyword for full text search. It's text
type.
weights
is importance factors of each value. It's int[]
type.
scorers
is score compute procedures of each value. It's text[]
type.
schema_name
is the schema name to which the index that PGroonga refers to when executing a sequential search belongs. It's text
type.
index_name
is index name which PGroonga refer to when executing sequential search. It's text
type.
column_name
is the column name within the index which PGroonga refers to when executing a sequential search. It's text
type.
fuzzy_max_distance_ratio
is the ratio of the edit distance. It's float4
type. (Since 3.2.1.)
See Groonga's fuzzy_max_distance_ratio
option for details.
All arguments of pgroonga_condition()
are optional. If you want to specify a particular argument, you can use Named Notation such as name => value
without relying on its position. For example, if you specify only index_name
argument, you can write pgroonga_condition(index_name => 'index1')
.
In general, it is enough to remember the following three cases.
pgroonga_condition('keyword', index_name => 'pgroonga_index')
pgroonga_condition('keyword', ARRAY[weight1, weight2, ...])
pgroonga_condition('keyword', ARRAY[weight1, weight2, ...], index_name => 'pgroonga_index')
Please refer to Calling Functions for information about the difference between when you need to write name => value
and when you don't.
index_name
Introducing how to search with normalizer and tokeniser options specified in the index while sequential search is executed.
Use pgroonga_condition('keyword', index_name => 'pgroonga_index')
.
Assign the name of index specified tokenizer or normalizer to index_name
.
Here are sample schema and data:
CREATE TABLE tags (
name text PRIMARY KEY
);
CREATE INDEX pgroonga_tag_name_index ON tags
USING pgroonga (name pgroonga_text_term_search_ops_v2)
WITH (normalizers='NormalizerNFKC150("remove_symbol", true)');
INSERT INTO tags VALUES ('PostgreSQL');
INSERT INTO tags VALUES ('Groonga');
INSERT INTO tags VALUES ('PGroonga');
INSERT INTO tags VALUES ('pglogical');
At the time of an index search, you can customize the behavior of the index search by using the options specified to the index.
In the above example, the options are specified in the normalizers='...'
section.
On the other hand, at the sequential search, the options specified to the index cannot be referenced. It is because there is no information which index to be referenced at the sequential search.
Because of that, there is a possibility that the search results may differ between a sequential search and an index search.
In order to avoid this issue, explicitly specify which index to be referenced at the sequential search.
The index_name => '...'
argument in pgroonga_condition()
is used for that.
The next example executes a prefix search with the keyword "_p_G
" while NormalizerNFKC150("remove_symbol", true)
is specified in the index.
Since remove_symbol
is the option to ignore the symbols, "_p_G
" is normalized to "pg
".
(The reason why the capital letter "G
" becomes a lowercase "g
" is not due to the remove_symbol
option, but rather the default behavior of NormalizerNFKC150
.)
Therefore, both "PGroonga
" and "pglogical
" should be hit while this option is active.
You can see that "PGroonga
" and "pglogical
" are hit as a result of the sequential search.
This result shows that NormalizerNFKC150("remove_symbol", true)
specified in the index is referenced even during sequential search execution.
EXPLAIN ANALYZE
SELECT *
FROM tags
WHERE name &^ pgroonga_condition('_p_G',
index_name => 'pgroonga_tag_name_index');
QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on tags (cost=0.00..1043.60 rows=1 width=32) (actual time=2.267..2.336 rows=2 loops=1)
Filter: (name &^ '(_p_G,,,,pgroonga_tag_name_index,)'::pgroonga_condition)
Rows Removed by Filter: 2
Planning Time: 0.871 ms
Execution Time: 2.352 ms
(5 rows)
SELECT *
FROM tags
WHERE name &^ pgroonga_condition('_p_G',
index_name => 'pgroonga_tag_name_index');
name
-----------
PGroonga
pglogical
(2 rows)
As you can see in next, "PGroonga
" and "pglogical
" would not be hit when index_name
is not specified. (This means that NormalizerNFKC150("remove_symbol", true)
cannot be referenced.)
EXPLAIN ANALYZE
SELECT *
FROM tags
WHERE name &^ pgroonga_condition('_p_G');
QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on tags (cost=0.00..1043.60 rows=1 width=32) (actual time=0.032..0.032 rows=0 loops=1)
Filter: (name &^ '(_p_G,,,,,)'::pgroonga_condition)
Rows Removed by Filter: 4
Planning Time: 0.910 ms
Execution Time: 0.053 ms
(5 rows)
SELECT *
FROM tags
WHERE name &^ pgroonga_condition('_p_G');
name
------
(0 rows)
In this way, by specifying index_name
, you can ensure that the search result remain the same whether executing a sequential search or an index search.
weights
Introducing how to specify different weights for each column. This allows the title to be weighted more than the main text.
Use pgroonga_condition('keyword', ARRAY[weight1, weight2, ...])
.
weight1
, weight2
, and so on specify the weights for each column.
Here are sample schema and data:
DROP TABLE IF EXISTS memos;
CREATE TABLE memos (
title text,
content text
);
CREATE INDEX pgroonga_memos_index
ON memos
USING pgroonga ((ARRAY[title, content]));
INSERT INTO memos VALUES ('PostgreSQL', 'PostgreSQL is a relational database management system.');
INSERT INTO memos VALUES ('Groonga', 'Groonga is the fast full text search engine optimized for Japanese.');
INSERT INTO memos VALUES ('PGroonga', 'PGroonga is an extension for PostgreSQL to use Groonga as the index.');
INSERT INTO memos VALUES ('command line', 'There is a groonga command.');
pgroonga_score function
can be used to search for records that match by the specified query.
SELECT *, pgroonga_score(tableoid, ctid) AS score
FROM memos
WHERE ARRAY[title, content] &@~
pgroonga_condition('Groonga OR PostgreSQL', ARRAY[5, 1])
ORDER BY score DESC;
title | content | score
--------------+----------------------------------------------------------------------+-------
Groonga | Groonga is the fast full text search engine optimized for Japanese. | 6
PostgreSQL | PostgreSQL is a relational database management system. | 6
PGroonga | PGroonga is an extension for PostgreSQL to use Groonga as the index. | 2
command line | There is a groonga command. | 1
(4 rows)
In above example, the title is 5 times weighted that the main text because ARRAY[title, content] &@~ pgroonga_condition('Groonga OR PostgreSQL', ARRAY[5, 1])
is specified.
You can see the score is higher for the records with Groonga
or PostgreSQL
in the title
column compared to records with Groonga
or PostgreSQL
in the content
column.
Introducing how to search without specific columns as the search target.
Use pgroonga_condition('keyword', ARRAY[weight1, weight2, ...])
.
You need to specify 0
to weight
of the column that would be excluded from the search target.
Here are sample schema and data:
DROP TABLE IF EXISTS memos;
CREATE TABLE memos (
title text,
content text
);
CREATE INDEX pgroonga_memos_index
ON memos
USING pgroonga ((ARRAY[title, content]));
INSERT INTO memos VALUES ('PostgreSQL', 'PostgreSQL is a relational database management system.');
INSERT INTO memos VALUES ('Groonga', 'Groonga is the fast full text search engine optimized for Japanese.');
INSERT INTO memos VALUES ('PGroonga', 'PGroonga is an extension for PostgreSQL to use Groonga as the index.');
INSERT INTO memos VALUES ('command line', 'There is a groonga command.');
In the next example, the content
column is excluded from the search target.
The record, 'PGroonga is an extension for PostgreSQL to use Groonga as the index.'
, should be hit if the content
column is included in the search target with the search keyword is "extension
", but the record is not hit.
You can see that the content
column is excluded from the search target.
SELECT *
FROM memos
WHERE ARRAY[title, content] &@~
pgroonga_condition('extension', ARRAY[1, 0]);
title | content
-------+---------
(0 rows)
As the content
column is set as the search target in the next example, the record 'PGroonga is an extension for PostgreSQL to use Groonga as an index.'
is hit.
SELECT *
FROM memos
WHERE ARRAY[title, content] &@~
pgroonga_condition('extension', ARRAY[1, 1]);
title | content
----------+----------------------------------------------------------------------
PGroonga | PGroonga is an extension for PostgreSQL to use Groonga as the index.
(1 row)