|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
ObjectContextMarker
public class ContextMarker
Workhorse class that handles marking hits, context surrounding hits, and search terms.
Created: Dec 26, 2004
| Field Summary | |
|---|---|
private MarkCollector |
collector
Client instance which receives the resulting marks |
private String |
field
Field name (for debugging) |
private WordIter |
iter0
Iterator used for locating the start of the hit/context |
private WordIter |
iter1
Iterator used for locating the end of the hit/context |
static int |
MARK_ALL_TERMS
See MARK_NO_TERMS |
static int |
MARK_CONTEXT_TERMS
See MARK_NO_TERMS |
static int |
MARK_NO_TERMS
The following modes can be used for term marking: MARK_NO_TERMS: Terms are not marked MARK_SPAN_TERMS: Search terms are marked only within span hits. |
static int |
MARK_SPAN_TERMS
See MARK_NO_TERMS |
private int |
maxContext
Target size (in chars) of the context surrounding each hit |
private int |
prevEndWord
End of the previous context |
private Set |
stopSet
Set of stop-words to avoid marking outside of hits |
private int |
termMode
Whether to mark terms inside/outside hits, context, etc. |
private Set |
terms
Set of search terms to mark |
private int |
termsMarkedPos
Word position up to which we've marked all terms |
private MarkPos |
tmpPos
Used to temporary position storage |
| Constructor Summary | |
|---|---|
ContextMarker(int maxContext,
int termMode,
Set terms,
Set stopSet,
WordIter wordIter,
MarkCollector collector,
String field)
Construct a new marker |
|
| Method Summary | |
|---|---|
(package private) void |
emitMarks(Span posSpan,
MarkPos contextStart,
MarkPos contextEnd)
Emit all the marks for the given hit. |
(package private) void |
findContext(Span posSpan,
Span nextSpan,
MarkPos contextStart,
MarkPos contextEnd)
Locate the start and end of context for the given hit. |
void |
mark(Span[] posOrderSpans,
int maxContext)
Mark a series of spans. |
static void |
markField(FieldSpans fieldSpans,
String field,
WordIter iter,
int maxContext,
int termMode,
Set stopSet,
MarkCollector collector)
Mark context, spans, and terms a field of data. |
void |
markField(String field,
FieldSpans fieldSpans,
MarkCollector collector)
Mark context, spans, and terms within the given field of this document. |
void |
markField(String field,
FieldSpans fieldSpans,
WordIter iter,
int maxContext,
int termMode,
Set stopSet,
MarkCollector collector)
Mark context, spans, and terms within the given field of this document. |
private void |
markTerms(WordIter iter,
int fromPos,
int toPos,
boolean markStopWords)
Mark terms up to (but not including) 'wordPos' |
| Methods inherited from class Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final int MARK_NO_TERMS
MARK_NO_TERMS: Terms are not marked
MARK_SPAN_TERMS: Search terms are marked only within span hits.
MARK_CONTEXT_TERMS: Search terms are marked within span hits and, if found, within the context surrounding those hits.
MARK_ALL_TERMS: Search terms are marked wherever they are found.
public static final int MARK_SPAN_TERMS
MARK_NO_TERMS
public static final int MARK_CONTEXT_TERMS
MARK_NO_TERMS
public static final int MARK_ALL_TERMS
MARK_NO_TERMS
private int maxContext
private WordIter iter0
private WordIter iter1
private MarkCollector collector
private Set terms
private Set stopSet
private int termMode
MARK_SPAN_TERMS, etc.
private int termsMarkedPos
private MarkPos tmpPos
private int prevEndWord
private String field
| Constructor Detail |
|---|
public ContextMarker(int maxContext,
int termMode,
Set terms,
Set stopSet,
WordIter wordIter,
MarkCollector collector,
String field)
| Method Detail |
|---|
public void markField(String field,
FieldSpans fieldSpans,
MarkCollector collector)
field - field name to markfieldSpans - spans to mark withcollector - collector to receive the marks
public void markField(String field,
FieldSpans fieldSpans,
WordIter iter,
int maxContext,
int termMode,
Set stopSet,
MarkCollector collector)
field - field name to markiter - iterator over the words in the fieldmaxContext - target number of characters for context around
each hit (including the text of the hit itself.)
80 is often a good choice. Specify zero to turn off
context marking.termMode - what areas to mark hits - see MARK_NO_TERMS.stopSet - set of stop words to avoid marking outside hitscollector - collector to receive the marks
public static void markField(FieldSpans fieldSpans,
String field,
WordIter iter,
int maxContext,
int termMode,
Set stopSet,
MarkCollector collector)
field - field name to markiter - iterator over the words in the fieldmaxContext - target number of characters for context around
each hit (including the text of the hit itself.)
80 is often a good choice. Specify zero to turn off
context marking.termMode - what areas to mark hits - see MARK_NO_TERMS.stopSet - set of stop words to avoid marking outside hitscollector - collector to receive the marks
public void mark(Span[] posOrderSpans,
int maxContext)
posOrderSpans - Spans to mark, in ascending position order.maxContext - Target # of chars for context around hits
(0 for none)
void findContext(Span posSpan,
Span nextSpan,
MarkPos contextStart,
MarkPos contextEnd)
posSpan - hit for which to find contextnextSpan - following hit (or null if none)contextStart - OUT: start of contextcontextEnd - OUT: end of context
void emitMarks(Span posSpan,
MarkPos contextStart,
MarkPos contextEnd)
posSpan - hit for which to emit markscontextStart - start of context (or null if context disabled)contextEnd - end of context (or null if context disabled)
private void markTerms(WordIter iter,
int fromPos,
int toPos,
boolean markStopWords)
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||