标签:sphinx
Sphinx是由俄罗斯人Andrew Aksyonoff开发的一个全文检索引擎。意图为其他应用提供高速、低空间占用、高结果 相关度的全文搜索功能。由于开发要求Sphinx中文分词,安装环境,就做下笔记
[root@localhost mmseg-3.2.14]# yum -y install make gcc g++ gcc-c++ libtool autoconf automake imake [root@localhost mmseg-3.2.14]# yum install libxml2-devel expat-devel [root@localhost sphinx]# tar xvf coreseek-3.2.14.tar.gz [root@localhost sphinx]# cd coreseek-3.2.14 [root@localhost coreseek-3.2.14]# cd mmseg-3.2.14/ [root@localhost mmseg-3.2.14]# aclocal [root@localhost mmseg-3.2.14]# libtoolize --force libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, `config‘. libtoolize: linking file `config/ltmain.sh‘ libtoolize: Consider adding `AC_CONFIG_MACRO_DIR([m4])‘ to configure.in and libtoolize: rerunning libtoolize, to keep the correct libtool macros in-tree. libtoolize: Consider adding `-I m4‘ to ACLOCAL_AMFLAGS in Makefile.am. [root@localhost mmseg-3.2.14]# [root@localhost mmseg-3.2.14]# automake --add-missing [root@localhost mmseg-3.2.14]# autoconf [root@localhost mmseg-3.2.14]# autoheader [root@localhost mmseg-3.2.14]# make clean [root@localhost mmseg-3.2.14]# ./configure --prefix=/usr/local/mmseg3 [root@localhost mmseg-3.2.14]# make && make install [root@localhost coreseek-3.2.14]# cd csft-3.2.14/ [root@localhost csft-3.2.14]# sh buildconf.sh [root@localhost csft-3.2.14]# ./configure --prefix=/usr/local/coreseek --without-unixodbc --with-mmseg --with-mmseg-includes=/usr/local/mmseg3/include/mmseg/ --with-mmseg- libs=/usr/local/mmseg3/lib/ --with-mysql [root@localhost csft-3.2.14]# make && make install [root@localhost testpack]# cat var/test/test.xml #显示中文 [root@localhost testpack]# /usr/local/mmseg3/bin/mmseg -d /usr/local/mmseg3/etc var/test/test.xml </x ?/x xml/x /x version/x =/x "/x 1/x ./x 0/x "/x /x encoding/x =/x "/x utf/x -/x 8/x "/x ?/x >/x </x sphinx/x :/x docset/x >/x /x </x sphinx/x :/x schema/x >/x /x </x sphinx/x :/x field/x /x name/x =/x "/x subject/x "/x //x >/x /x /x </x sphinx/x :/x field/x /x name/x =/x "/x content/x "/x //x >/x /x </x sphinx/x :/x attr/x /x name/x =/x "/x published/x "/x /x type/x =/x "/x timestamp/x "/x //x >/x /x </x sphinx/x :/x attr/x /x name/x =/x "/x author/x _/x id/x "/x /x type/x =/x "/x int/x "/x /x bits/x =/x "/x 16/x "/x /x default/x =/x "/x 1/x "/x //x >/x /x </x //x sphinx/x :/x schema/x >/x /x </x sphinx/x :/x document/x /x id/x =/x "/x 1/x "/x >/x /x /x </x subject/x >/x 愚人/x 节/x 最佳/x 蛊惑/x 爆/x 料/x /x 谷/x 歌/x 300/x 亿/x 美元/x 收购/x 百/x 度/x </x //x subject/x >/x /x /x </x published/x >/x 1270131607/x </x //x published/x >/x /x /x </x content/x >/x 据/x 国外/x 媒体/x 报道/x ,/x 谷/x 歌/x 将/x 巨资/x 收购/x 百/x 度/x ,/x 涉及/x 金额/x 高达/x 300/x 亿/x 美元/x 。/x 谷/x 歌/x 借/x 此/x 重返/x 大陆/x 市场/x 。/x /x /x 该/x 报道/x 称/x ,/x 目前/x 谷/x 歌/x 与/x 百/x 度/x 已经/x 达成/x 了/x 收购/x 协议/x ,/x 将/x 择机/x 对外/x 公布/x 。/x 百/x 度/x 的/x 管理层/x 将/x 100/x %/x 保 留/x ,/x 但/x 会/x 将/x 项目/x 缩减/x ,/x 包括/x 有/x 啊/x 商城/x ,/x 以及/x 目前/x 实施/x 不力/x 的/x 凤/x 巢/x 计划/x 。/x 正在/x 进行/x 测试/x 阶段/x 的/x 视频/x 网站/x qiyi/x ./x com/x 将/x 输入/x 更/x 多/x 的/x Youtube/x 资源/x 。/x (/x YouTube/x 在/x 大陆/x 区/x 因/x 内容/x 审查/x 暂/x 不/x 能/x 访问/x )/x 。/x [root@localhost testpack]# /usr/local/coreseek/bin/indexer -c etc/csft.conf --all Coreseek Fulltext 3.2 [ Sphinx 0.9.9-release (r2117)] Copyright (c) 2007-2011, Beijing Choice Software Technologies Inc (http://www.coreseek.com) using config file ‘etc/csft.conf‘... indexing index ‘xml‘... collected 3 docs, 0.0 MB sorted 0.0 Mhits, 100.0% done total 3 docs, 7585 bytes total 0.008 sec, 945524 bytes/sec, 373.97 docs/sec total 2 reads, 0.000 sec, 4.2 kb/call avg, 0.0 msec/call avg total 7 writes, 0.000 sec, 3.1 kb/call avg, 0.0 msec/call avg [root@localhost testpack]# /usr/local/coreseek/bin/search -c etc/csft.conf 结婚的和尚未结婚的 Coreseek Fulltext 3.2 [ Sphinx 0.9.9-release (r2117)] Copyright (c) 2007-2011, Beijing Choice Software Technologies Inc (http://www.coreseek.com) using config file ‘etc/csft.conf‘... index ‘xml‘: query ‘结婚的和尚未结婚的 ‘: returned 0 matches of 0 total in 0.004 sec words: 1. ‘结婚‘: 0 documents, 0 hits 2. ‘的‘: 3 documents, 83 hits 3. ‘和‘: 3 documents, 15 hits 4. ‘尚未‘: 0 documents, 0 hits [root@localhost python]# /usr/local/coreseek/bin/searchd -c /opt/sphinx/coreseek-3.2.14/testpack/etc/csft_cjk.conf &
标签:sphinx
原文地址:http://kingtigerhu.blog.51cto.com/2936525/1580075