lucene包结构

时间：2016-05-04 18:59:37 阅读：184 评论：0 收藏：0 [点我收藏+]

标签：

lucene 2.2包结构：

技术分享

analysis不做详细介绍，因为在实际开发中会使用对中文支持的庖丁分词来做为分词器。

document:是写索引的时候的非常重要的一个工具，要把原始数据转为一个个document,然后进行write.

index：写索引的核心包

queryParser:搜索时候的解析器。

search:搜索的核心包

store：在写索引的时候，可以控制哪些field是要存储的。。。

util：工具包。

index:

index的头注释：

An <code>IndexWriter</code> creates and maintains an index.

The <code>create</code> argument to the
<a href="#IndexWriter(org.apache.lucene.store.Directory, org.apache.lucene.analysis.Analyzer, boolean)">constructor</a>
determines whether a new index is created, or whether an existing index is
opened. Note that you
can open an index with <code>create=true</code> even while readers are
using the index. The old readers will continue to search
the "point in time" snapshot they had opened, and won‘t
see the newly created index until they re-open. There are
also <a href="#IndexWriter(org.apache.lucene.store.Directory, org.apache.lucene.analysis.Analyzer)">constructors</a>
with no <code>create</code> argument which
will create a new index if there is not already an index at the
provided path and otherwise open the existing index.

In either case, documents are added with <a
href="#addDocument(org.apache.lucene.document.Document)">addDocument</a>
and removed with <a
href="#deleteDocuments(org.apache.lucene.index.Term)">deleteDocuments</a>.
A document can be updated with <a href="#updateDocument(org.apache.lucene.index.Term, org.apache.lucene.document.Document)">updateDocument</a>
(which just deletes and then adds the entire document).
When finished adding, deleting and updating documents, <a href="#close()">close</a> should be called.

These changes are buffered in memory and periodically
flushed to the {@link Directory} (during the above method
calls). A flush is triggered when there are enough
buffered deletes (see {@link #setMaxBufferedDeleteTerms})
or enough added documents since the last flush, whichever
is sooner. For the added documents, flushing is triggered
either by RAM usage of the documents (see {@link
#setRAMBufferSizeMB}) or the number of added documents.
The default is to flush when RAM usage hits 16 MB. For
best indexing speed you should flush by RAM usage with a
large RAM buffer. You can also force a flush by calling
{@link #flush}. When a flush occurs, both pending deletes
and added documents are flushed to the index. A flush may
also trigger one or more segment merges which by default
run with a background thread so as not to block the
addDocument calls (see <a href="#mergePolicy">below</a>
for changing the {@link MergeScheduler}).

<a name="autoCommit"></a>
The optional <code>autoCommit</code> argument to the
<a href="#IndexWriter(org.apache.lucene.store.Directory, boolean, org.apache.lucene.analysis.Analyzer)">constructors</a>
controls visibility of the changes to {@link IndexReader} instances reading the same index.
When this is <code>false</code>, changes are not
visible until {@link #close()} is called.
Note that changes will still be flushed to the
{@link org.apache.lucene.store.Directory} as new files,
but are not committed (no new <code>segments_N</code> file
is written referencing the new files) until {@link #close} is
called. If something goes terribly wrong (for example the
JVM crashes) before {@link #close()}, then
the index will reflect none of the changes made (it will
remain in its starting state).
You can also call {@link #abort()}, which closes the writer without committing any
changes, and removes any index
files that had been flushed but are now unreferenced.
This mode is useful for preventing readers from refreshing
at a bad time (for example after you‘ve done all your
deletes but before you‘ve done your adds).
It can also be used to implement simple single-writer
transactional semantics ("all or none").

When <code>autoCommit</code> is <code>true</code> then
every flush is also a commit ({@link IndexReader}
instances will see each flush as changes to the index).
This is the default, to match the behavior before 2.2.
When running in this mode, be careful not to refresh your
readers while optimize or segment merges are taking place
as this can tie up substantial disk space.

Regardless of <code>autoCommit</code>, an {@link
IndexReader} or {@link org.apache.lucene.search.IndexSearcher} will only see the
index as of the "point in time" that it was opened. Any
changes committed to the index after the reader was opened
are not visible until the reader is re-opened.

If an index will not have more documents added for a while and optimal search
performance is desired, then the <a href="#optimize()">optimize</a>
method should be called before the index is closed.

Opening an <code>IndexWriter</code> creates a lock file for the directory in use. Trying to open
another <code>IndexWriter</code> on the same directory will lead to a
{@link LockObtainFailedException}. The {@link LockObtainFailedException}
is also thrown if an IndexReader on the same directory is used to delete documents
from the index.

<a name="deletionPolicy"></a>
Expert: <code>IndexWriter</code> allows an optional
{@link IndexDeletionPolicy} implementation to be
specified. You can use this to control when prior commits
are deleted from the index. The default policy is {@link
KeepOnlyLastCommitDeletionPolicy} which removes all prior
commits as soon as a new commit is done (this matches
behavior before 2.2). Creating your own policy can allow
you to explicitly keep previous "point in time" commits
alive in the index for some time, to allow readers to
refresh to the new commit without having the old commit
deleted out from under them. This is necessary on
filesystems like NFS that do not support "delete on last
close" semantics, which Lucene‘s "point in time" search
normally relies on. 

<a name="mergePolicy"></a> Expert:
<code>IndexWriter</code> allows you to separately change
the {@link MergePolicy} and the {@link MergeScheduler}.
The {@link MergePolicy} is invoked whenever there are
changes to the segments in the index. Its role is to
select which merges to do, if any, and return a {@link
MergePolicy.MergeSpecification} describing the merges. It
also selects merges to do for optimize(). (The default is
{@link LogByteSizeMergePolicy}. Then, the {@link
MergeScheduler} is invoked with the requested merges and
it decides when and how to run the merges. The default is
{@link ConcurrentMergeScheduler}.

lucene包结构

标签：

原文地址：http://www.cnblogs.com/mggwct/p/5459080.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行