PostgreSQL full text search

PostgreSQL full text search,第1张

barryp.org blog: PostgreSQL full text search with Django

PostgreSQL full text search with Django

PostgreSQL 8.3 is coming out soon with full text search integrated into the core database system. It's pretty well documented in chapter 12 of the PostgreSQL docs. The docs are a bit intimidating, but it turns out to be pretty easy to use with Django.

Let's say you're doing a stereotypical blog application, named 'blog', and have a model for entries such as:

django-fts - Generic Full Text Search engine for Django projects - Google Project Hosting

This is a generic Full Text Search engine for Django projects

Currently implements three backends: dummy, simple and pgsql.

  • dummy - just uses ILIKE to do the search (no indexes, very slow)
  • simple - implements the search using two helper tables for the indexes
  • pgsql - uses PostgreSQL 8.3 full text search engine

It should be possible to easily integrate MySQL, Sphinx and Xapian backends too.


Install

To install the latest version:

svn checkout http://django-fts.googlecode.com/svn/trunk/ django-fts
cd django-fts
python setup.py install

Note: You will need to install the Snowball python bindings if you want to use the snowball stemmer. If you don't a bundled stemmer based in the Porter algorithm will be used (this is also not required if you are using the PostgreSQL backend). Get the Snowball bindings package from http://snowball.tartarus.org/wrappers/PyStemmer-1.0.1.tar.gz

Usage example

Add the fts app to your settings.py file and optionally configure a fts backend (simple by default):

INSTALLED_APPS = (
    #...
    'fts'
)
#FTS_BACKEND = 'pgsql://' # or 'dummy://' or 'simple://'

Assume that we have this model in our imaginary application:

from django.db import models

class Blog(models.Model):
    title = models.CharField(max_length=100)
    body = models.TextField()

    def __unicode__(self):
        return u"%s" % (self.title)

And we want to apply full text search functionality for model Blog. You need to subclass your model from fts.SearchableModule instead of from django.db.models.Model. The new module may look like this:

from django.db import models
import fts

class Blog(fts.SearchableModel):
    title = models.CharField(max_length=100)
    body = models.TextField()

    # Defining a SearchManager without fields will use all CharFields and TextFields.
    # This is the default and you do not need to explicitly add the following line:
    # objects = fts.SearchManager()

    # You can pass a list of fields that should be indexed
    # objects = fts.SearchManager( fields=('title','body') )
    # The fields you pass as parameters can be foreign fields ('myfield__foreign_field')
    # or even functions (functions should receive the instance as the only parameter)
   
    # You may also specify fields as a dictionary, mapping each field to a weight for ranking purposes
    # see http://www.postgresql.org/docs/8.3/static/textsearch-features.html#TEXTSEARCH-MANIPULATE-TSVECTOR
    #objects = SearchManager( fields={
    #    'title': 'A',
    #    'body': 'B',
    #} )

    def __unicode__(self):
        return u"%s" % (self.title)

In the django shell create some instances of models:

python ./manage.py shell

>>> from core.models import Blog
>>> p = Blog(title='This is the title', body='The body of the article')
>>> p.save()
>>> p = Blog(title='This is the second title', body='The body of another article in the blog')
>>> p.save()
>>> p = Blog(title='This is the third title', body='The body of yet another simple article')
>>> p.save()

Now perform a search:

>>> result = Blog.objects.search('simple').all()
>>> result.count()
1
>>> result
[<Blog: This is the third title>]
Additional information

You can force an index update to all or some instances:

>>> p.update_index()
>>> Blog.objects.update_index()
>>> Blog.objects.update_index(pk=1)
>>> Blog.objects.update_index(pk=[1, 2])

You can omit the search function and make the search directly:

>>> result = Blog.objects('simple')
>>> result.count()
1
>>> result
[<Blog: This is the third title>]
PostgreSQL specific information

The PostgreSQL backend is heavily based in the code from http://www.djangosnippets.org/snippets/1328/ by Dan Watson.

If using the pgsql backend, don't forget to add a Gin or GiST index to your tables: http://www.postgresql.org/docs/8.3/static/textsearch-indexes.html

Example
CREATE INDEX "tablename_search_index" ON "tablename" USING gin("search_index");

Note: You should index the search_index column, not your text or char columns.

django-fts - Generic Full Text Search engine for Django projects - Google Project Hosting

This is a generic Full Text Search engine for Django projects

Currently implements three backends: dummy, simple and pgsql.

  • dummy - just uses ILIKE to do the search (no indexes, very slow)
  • simple - implements the search using two helper tables for the indexes
  • pgsql - uses PostgreSQL 8.3 full text search engine

It should be possible to easily integrate MySQL, Sphinx and Xapian backends too.


Install

To install the latest version:

svn checkout http://django-fts.googlecode.com/svn/trunk/ django-fts
cd django-fts
python setup.py install

Note: You will need to install the Snowball python bindings if you want to use the snowball stemmer. If you don't a bundled stemmer based in the Porter algorithm will be used (this is also not required if you are using the PostgreSQL backend). Get the Snowball bindings package from http://snowball.tartarus.org/wrappers/PyStemmer-1.0.1.tar.gz

Usage example

Add the fts app to your settings.py file and optionally configure a fts backend (simple by default):

INSTALLED_APPS = (
    #...
    'fts'
)
#FTS_BACKEND = 'pgsql://' # or 'dummy://' or 'simple://'

Assume that we have this model in our imaginary application:

from django.db import models

class Blog(models.Model):
    title = models.CharField(max_length=100)
    body = models.TextField()

    def __unicode__(self):
        return u"%s" % (self.title)

And we want to apply full text search functionality for model Blog. You need to subclass your model from fts.SearchableModule instead of from django.db.models.Model. The new module may look like this:

from django.db import models
import fts

class Blog(fts.SearchableModel):
    title = models.CharField(max_length=100)
    body = models.TextField()

    # Defining a SearchManager without fields will use all CharFields and TextFields.
    # This is the default and you do not need to explicitly add the following line:
    # objects = fts.SearchManager()

    # You can pass a list of fields that should be indexed
    # objects = fts.SearchManager( fields=('title','body') )
    # The fields you pass as parameters can be foreign fields ('myfield__foreign_field')
    # or even functions (functions should receive the instance as the only parameter)
   
    # You may also specify fields as a dictionary, mapping each field to a weight for ranking purposes
    # see http://www.postgresql.org/docs/8.3/static/textsearch-features.html#TEXTSEARCH-MANIPULATE-TSVECTOR
    #objects = SearchManager( fields={
    #    'title': 'A',
    #    'body': 'B',
    #} )

    def __unicode__(self):
        return u"%s" % (self.title)

In the django shell create some instances of models:

python ./manage.py shell

>>> from core.models import Blog
>>> p = Blog(title='This is the title', body='The body of the article')
>>> p.save()
>>> p = Blog(title='This is the second title', body='The body of another article in the blog')
>>> p.save()
>>> p = Blog(title='This is the third title', body='The body of yet another simple article')
>>> p.save()

Now perform a search:

>>> result = Blog.objects.search('simple').all()
>>> result.count()
1
>>> result
[<Blog: This is the third title>]
Additional information

You can force an index update to all or some instances:

>>> p.update_index()
>>> Blog.objects.update_index()
>>> Blog.objects.update_index(pk=1)
>>> Blog.objects.update_index(pk=[1, 2])

You can omit the search function and make the search directly:

>>> result = Blog.objects('simple')
>>> result.count()
1
>>> result
[<Blog: This is the third title>]
PostgreSQL specific information

The PostgreSQL backend is heavily based in the code from http://www.djangosnippets.org/snippets/1328/ by Dan Watson.

If using the pgsql backend, don't forget to add a Gin or GiST index to your tables: http://www.postgresql.org/docs/8.3/static/textsearch-indexes.html

Example
CREATE INDEX "tablename_search_index" ON "tablename" USING gin("search_index");

Note: You should index the search_index column, not your text or char columns.

Inexact full-text search in PostgreSQL and Django - Stack Overflow

Your best bet is to use Django raw querysets, I use it with MySQL to perform full text matching. If the data is all in the database and Postgres provides the matching capability then it makes sense to use it. Plus Postgres offers some really useful things in terms of stemming etc with full text queries.

Basically it lets you write the actual query you want yet returns models (as long as you are querying a model table obviously).

The advantage this gives you is that you can test the exact query you will be using first in Postgres, the documentation covers full text queries pretty well.

The main gotcha with raw querysets at the moment is they don't support count. So if you will be returning lots of data and have memory constraints on your application you might need to do something clever.

Django | QuerySet API reference | Django documentation

search¶

A boolean full-text search, taking advantage of full-text indexing. This is
like contains but is significantly faster due to full-text indexing.

Example:

Entry.objects.filter(headline__search="+Django -jazz Python")

欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/zaji/2085904.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-07-22
下一篇 2022-07-22

发表评论

登录后才能评论

评论列表(0条)

保存