Wednesday, December 14, 2016

Anomaly - MySQL Full-text Searches and SQLAlchemy; the Present and a Proposed Future

Anomaly - MySQL Full-text Searches and SQLAlchemy; the Present and a Proposed Future
MySQL Full-text Searches and SQLAlchemy; the Present and a Proposed Future

Addendum: Since the publication of this article we discovered Meng Zhuo's MySQL full-text add-on for SQLAlchemy and have decided to side with this existing effort over rolling our own solution.

MySQL supports an array of full-text search features recommended for performing advanced textual queries on fields of CHAR, VARCHAR and TEXT types. For a variety of great reasons Python is our language of choice and there's no better Pythonic interface to a relational database than SQLAlchemy.

SQLAlchemy is feature-packed and rock-solid. Unfortunately, it does not completely support all of what MySQL has to offer in the way of full-text searches.

The following article provides a brief introduction to full-text searching in MySQL, how—and to what extent—you can use it in SQLAlchemy 1.0.13 as well as our proposal for feature-complete full-text searching in SQLAlchemy and working with the core team to merge it back into the core code base.

Working with Full-text Indexes

Full-text indexes are created on one or a group of fields. The exact same set of fields must be used in MATCH and AGAINST syntax to perform full-text queries.

Indexes can be created as part of the initial table definition:

CREATE TABLE customer (
customer_id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
name VARCHAR(200),
physical_address_street TEXT(),
FULLTEXT name_index (name)
) ENGINE=InnoDB;
Or on an existing table:

CREATE FULLTEXT INDEX `name_index` ON customer(name);

/* or across multiple fields */

CREATE FULLTEXT INDEX `fulltext_index` ON customer(name, physical_address_street);
Queries performed using MATCH and AGAINST are able to make use of the full-text index. They can be run in three different modes:

Natural Language
Natural Language with Query Expansion
Boolean Mode
Note: The exact same fields used to create the full-text index must appear when running full-text searches.

MySQL's documentation covers querying in great detail. A basic BOOLEAN mode query would look as follows:

SELECT customer_id FROM customer WHERE MATCH(name) AGAINST('anomaly+ soft*' IN BOOLEAN MODE);

/* or across multiple fields */

SELECT customer_id FROM customer WHERE MATCH(name, physical_address_street)
AGAINST('anomaly+ soft*' IN BOOLEAN MODE);
Full-text Searching via SQLAlchemy

At present SQLAlchemy (as at version 1.0.13) has implemented fairly basic support for full-text searching. The query interface is limited to single fields and queries are preset to run in BOOLEAN mode.

customers = session.query(Customer).filter(Customer.name.match("anomaly")).all()
Running queries against multiple fields or other modes is only possible by rolling your own ClauseElement. There's an extensive thread on StackOverflow where Mike Bayer, the author of SQLAlchemy, has outlined a detailed solution.

Note: Using BOOLEAN mode operators crash SQLAlchemy. This would most likely be caused by the match function failing to compile the operators.

There's also no support for creating full-text indexes via the declarative_base. The best option is to hook onto the after_create event and create the index via the DDL.

event.listen(
Customer.__table__,
"after_create",
DDL("CREATE FULLTEXT INDEX `name` ON %(table)s(name)"))
Proposed Future

While investigating full-text support in SQLAlchemy, I had a brief discussion on the mailing list. As a result we've decided to take on the job of completely implementing MySQL full-text support in SQLAlchemy. We plan to support:

Creation of full-text indexes via the declarative_base
Multiple fields across multiple indexes
Querying multiple fields in BOOLEAN, NATURAL LANGUAGE and NATURAL LANGUAGE mode with QUERY EXPANSION
You can follow the development on our forked repository on Github.

databasepython

No comments: