码迷,mamicode.com
首页 > 其他好文 > 详细

Moses翻译过程中的参数,程序运行弹出的列表,记录在这了

时间:2014-08-25 16:56:34      阅读:392      评论:0      收藏:0      [点我收藏+]

标签:des   style   blog   http   color   os   io   for   ar   

Moses - A beam search decoder for phrase-based statistical machine translation models
Copyright (C) 2006 University of Edinburgh

This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.

This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public
License along with this library; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

***********************************************************************

Built on Aug 17 2014 at 00:05:32

WHOS FAULT IS THIS GODDAM SOFTWARE:
Marcello Federico          contact: federico at itc at it   Researcher at ITC-irst, Trento, Italy   Ill answer question on: IRST language model
Christine Moran    contact: weird building at MIT
Ondrej Bojar   czech this out!
Chris Callison-Burch       contact: anytime, anywhere   international playboy
Chris Dyer         contact: cant. ill be out driving my mustang   driving my mustang
Philipp Koehn      contact: only between 2 and 4am   Ill answer question on: Nothing fazes this dude
Richard Zens       contact: richard at aachen dot de   Ill answer question on: ambiguous source input, confusion networks, confusing source code
Evan Herbst        contact: Small college in upstate New York
Hieu Hoang         contact: http://www.hoang.co.uk/hieu/   phd student at Edinburgh Uni. Original Moses developer   I‘ll answer question on: general queries/ flames on Moses.
Nicola Bertoldi    contact: 911   Ill answer question on: scripts & other stuff
Brooke Cowan       contact: brooke@csail.mit.edu   if youre going to san francisco, be sure to wear a flower in your hair
Alexandra Constantin   eu sunt varza
Wade Shen          contact: via morse code   buying another laptop


Usage:
        -alignment-output-file: print output word alignments into given file
        -alternate-weight-setting (aws): alternate set of weights to used per xml specification
        -beam-threshold (b): threshold for threshold pruning
        -clean-lm-cache: clean language model caches after N translations (default N=1)
        -config (f): location of the configuration file
        -consensus-decoding (con): use consensus decoding (De Nero et. al. 2009)
        -cube-pruning-diversity (cbd): How many hypotheses should be created for each coverage. (default = 0)
        -cube-pruning-lazy-scoring (cbls): Dont fully score a hypothesis until it is popped
        -cube-pruning-pop-limit (cbp): How many hypotheses should be popped for each stack. (default = 1000)
        -decoding-graph-backoff (dpb): only use subsequent decoding paths for unknown spans of given length
        -default-non-term-for-empty-range-only: Dont add [X] to all ranges, just ranges where there isnt a source non-term. Default = false (ie. add [X] everywhere)
        -description: Source language, target language, description
        -disable-discarding (dd): disable hypothesis discarding
        -distortion: configurations for each factorized/lexicalized reordering model.
        -distortion-file: source factors (0 if table independent of source), target factors, location of the factorized/lexicalized reordering tables
        -distortion-limit (dl): distortion (reordering) limit in maximum number of words (0 = monotone, -1 = unlimited)
        -dlm-model: DEPRECATED. DO NOT USE. Order, factor and vocabulary file for discriminative LM. Use * for filename to indicate unlimited vocabulary.
        -drop-unknown (du): drop unknown words instead of copying them
        -early-discarding-threshold (edt): threshold for constructing hypotheses based on estimate cost
        -early-distortion-cost (edc): include estimate of distortion cost yet to be incurred in the score [Moore & Quirk 2007]. Default is no
        -factor-delimiter (fd): specify a different factor delimiter than the default
        -feature: All the feature functions should be here
        -feature-add: Add a feature function on the command line. Used by mira to add BLEU feature
        -feature-name-overwrite: Override feature name (NOT arguments). Eg. SRILM-->KENLM, PhraseDictionaryMemory-->PhraseDictionaryScope3
        -feature-overwrite: Override arguments in a particular feature function with a particular key. Format: -feature-overwrite "FeatureName key=value"
        -generation-file: DEPRECATED. DO NOT USE. location and properties of the generation table
        -glm-feature: DEPRECATED. DO NOT USE. discriminatively trained global lexical translation feature, sparse producer
        -global-lexical-file (gl): DEPRECATED. DO NOT USE. discriminatively trained global lexical translation model file
        -include-lhs-in-search-graph (lhssg): When outputting chart search graph, include the label of the LHS of the rule (useful when using syntax)
        -include-segmentation-in-n-best: include phrasal segmentation in the n-best list. default is false
        -input-factors: list of factors in the input
        -input-file (i): location of the input file to be translated
        -input-scores: DEPRECATED. DO NOT USE. 2 numbers on 2 lines - [1] of scores on each edge of a confusion network or lattice input (default=1). [2] Number of real word scores (0 or 1. default=0)
        -inputtype: text (0), confusion network (1), word lattice (2), tree (3) (default = 0)
        -labeled-n-best-list: print out labels for each weight type in n-best list. default is true
        -lattice-hypo-set: to use lattice as hypo set during lattice MBR
        -lattice-samples: generate samples from lattice, in same format as nbest list. Uses the file and size arguments, as in n-best-list
        -link-param-count: DEPRECATED. DO NOT USE. Number of parameters on word links when using confusion networks or lattices (default = 1)
        -lmbr-map-weight: weight given to map solution when doing lattice MBR (default 0)
        -lmbr-p: unigram precision value for lattice mbr
        -lmbr-pruning-factor: average number of nodes/word wanted in pruned lattice
        -lmbr-r: ngram precision decay value for lattice mbr
        -lmbr-thetas: theta(s) for lattice mbr calculation
        -lminimum-bayes-risk (lmbr): use lattice miminum Bayes risk to determine best translation
        -lmodel-dub: DEPRECATED. DO NOT USE. dictionary upper bounds of language models
        -lmodel-file: DEPRECATED. DO NOT USE. location and properties of the language models
        -lmodel-oov-feature: add language model oov feature, one per model
        -mapping: description of decoding steps
        -mark-unknown (mu): mark unknown words in output
        -max-chart-span: maximum num. of source word chart rules can consume (default 10)
        -max-partial-trans-opt: maximum number of partial translation options per input span (during mapping steps)
        -max-phrase-length: maximum phrase length (default 20)
        -max-trans-opt-per-coverage: maximum number of translation options per input span (after applying mapping steps)
        -mbr-scale: scaling factor to convert log linear score probability in MBR decoding (default 1.0)
        -mbr-size: number of translation candidates considered in MBR decoding (default 200)
        -minimum-bayes-risk (mbr): use miminum Bayes risk to determine best translation
        -minlexr-memory: Load lexical reordering table in minlexr format into memory
        -minphr-memory: Load phrase table in minphr format into memory
        -mira: do mira training
        -monotone-at-punctuation (mp): do not reorder over punctuation
        -n-best-factor: factor to compute the maximum number of contenders (=factor*nbest-size). value 0 means infinity, i.e. no threshold. default is 0
        -n-best-list: file and size of n-best-list to be generated; specify - as the file in order to write to STDOUT
        -no-cache: Disable all phrase-table caching. Default = false (ie. enable caching)
        -non-terminals: list of non-term symbols, space separated
        -output-factors: list if factors in the output
        -output-hypo-score: Output the hypo score to stdout with the output string. For search error analysis. Default is false
        -output-search-graph (osg): Output connected hypotheses of search into specified filename
        -output-search-graph-extended (osgx): Output connected hypotheses of search into specified filename, in extended format
        -output-search-graph-hypergraph: Output connected hypotheses of search into specified directory, one file per sentence, in a hypergraph format (see Kenneth Heafields lazy hypergraph decoder). This flag is followed by 3 values: true (gz|txt|bz) directory-name        -output-search-graph-slf (slf): Output connected hypotheses of search into specified directory, one file per sentence, in HTK standard lattice format (SLF) - the flag should be followed byy a directory name, which must exist
        -output-unknowns: Output the unknown (OOV) words to the given file, one line per sentence
        -output-word-graph (owg): Output stack info as word graph. Takes filename, 0=only hypos in stack, 1=stack + nbest hypos
        -phrase-boundary-source-feature: DEPRECATED. DO NOT USE. Source factors for phrase boundary feature
        -phrase-boundary-target-feature: DEPRECATED. DO NOT USE. Target factors for phrase boundary feature
        -phrase-drop-allowed (da): if present, allow dropping of source words
        -phrase-length-feature: DEPRECATED. DO NOT USE. Count features for source length, target length, both of each phrase
        -phrase-pair-feature: DEPRECATED. DO NOT USE. Source and target factors for phrase pair feature
        -placeholder-factor: Which source factor to use to store the original text for placeholders. The factor must not be used by a translation or gen model
        -print-alignment-info: Output word-to-word alignment to standard out, separated from translation by |||. Word-to-word alignments are takne from the phrase table if any. Default is false
        -print-alignment-info-in-n-best: Include word-to-word alignment in the n-best list. Word-to-word alignments are takne from the phrase table if any. Default is false
        -print-all-derivations: to print all derivations in search graph
        -print-id: prefix translations with id. Default if false
        -recover-input-path (r): (conf net/word lattice only) - recover input path corresponding to the best translation
        -references: Reference file(s) - used for bleu score feature
        -report-all-factors: report all factors in output, not just first
        -report-all-factors-in-n-best: Report all factors in n-best-lists. Default is false
        -report-segmentation (t): report phrase segmentation in the output
        -report-segmentation-enriched (tt): report phrase segmentation in the output with additional information
        -rule-limit: a little like table limit. But for chart decoding rules. Default is DEFAULT_MAX_TRANS_OPT_SIZE
        -search-algorithm: Which search algorithm to use. 0=normal stack, 1=cube pruning, 2=cube growing, 4=stack with batched lm requests (default = 0)
        -show-weights: print feature weights and exit
        -sort-word-alignment: Sort word alignments for more consistent display. 0=no sort (default), 1=target order
        -source-label-overlap: What happens if a span already has a label. 0=add more. 1=replace. 2=discard. Default is 0
        -source-word-deletion-feature: DEPRECATED. DO NOT USE. Count feature for each unaligned source word
        -stack (s): maximum stack size for histogram pruning. 0 = unlimited stack size
        -stack-diversity (sd): minimum number of hypothesis of each coverage in stack (default 0)
        -start-translation-id: Id of 1st input. Default = 0
        -target-word-insertion-feature: DEPRECATED. DO NOT USE. Count feature for each unaligned target word
        -text-type: DEPRECATED. DO NOT USE. should be one of dev/devtest/test, used for domain adaptation features
        -threads (th): number of threads to use in decoding (defaults to single-threaded)
        -time-out: seconds after which is interrupted (-1=no time-out, default is -1)
        -translation-all-details (Tall): for all hypotheses, report translation details to the given file
        -translation-details (T): for each best hypothesis, report translation details to the given file
        -translation-option-threshold (tot): threshold for translation options relative to best for input phrase
        -tree-translation-details (Ttree): for each hypothesis, report translation details with tree fragment info to given file
        -ttable-file: DEPRECATED. DO NOT USE. location and properties of the translation tables
        -unknown-lhs: file containing target lhs of unknown words. 1 per line: LHS prob
        -unpruned-search-graph (usg): When outputting chart search graph, do not exclude dead ends. Note: stack pruning may have eliminated some hypotheses
        -verbose (v): verbosity level of the logging
        -weight: weights for ALL models, 1 per line WeightName value. Weight names can be repeated
        -weight-add: Add weight for FF if it doesnt exist, i.e weights here are added 1st, and can be override by the ini file or on the command line. Used to specify initial weights for FF that was also specified on the copmmand line
        -weight-bl (bl): DEPRECATED. DO NOT USE. weight for bleu score feature
        -weight-d (d): DEPRECATED. DO NOT USE. weight(s) for distortion (reordering components)
        -weight-dlm (dlm): DEPRECATED. DO NOT USE. weight for discriminative LM feature function (on top of sparse weights)
        -weight-e (e): DEPRECATED. DO NOT USE. weight for word deletion
        -weight-file (wf): feature weights file. Do *not* put weights for core features in here - they go in moses.ini
        -weight-generation (g): DEPRECATED. DO NOT USE. weight(s) for generation components
        -weight-glm (glm): DEPRECATED. DO NOT USE. weight for global lexical feature, sparse producer
        -weight-i (I): DEPRECATED. DO NOT USE. weight(s) for word insertion - used for parameters from confusion network and lattice input links
        -weight-l (lm): DEPRECATED. DO NOT USE. weight(s) for language models
        -weight-lex (lex): DEPRECATED. DO NOT USE. weight for global lexical model
        -weight-lr (lr): DEPRECATED. DO NOT USE. weight(s) for lexicalized reordering, if not included in weight-d
        -weight-overwrite: special parameter for mert. All on 1 line. Overrides weights specified in weights argument
        -weight-pb (pb): DEPRECATED. DO NOT USE. weight for phrase boundary feature
        -weight-pp (pp): DEPRECATED. DO NOT USE. weight for phrase pair feature
        -weight-slm (slm): DEPRECATED. DO NOT USE. weight(s) for syntactic language model
        -weight-t (tm): DEPRECATED. DO NOT USE. weights for translation model components
        -weight-u (u): DEPRECATED. DO NOT USE. weight for unknown word penalty
        -weight-w (w): DEPRECATED. DO NOT USE. weight for word penalty
        -weight-wt (wt): DEPRECATED. DO NOT USE. weight for word translation feature
        -word-translation-feature: DEPRECATED. DO NOT USE. Count feature for word translation according to word alignment
        -xml-brackets (xb): specify strings to be used as xml tags opening and closing, e.g. "{{ }}" (default "< >"). Avoid square brackets because of configuration file format. Valid only with text input mode
        -xml-input (xi): allows markup of input with desired translations and probabilities. values can be pass-through (default), inclusive, exclusive, constraint, ignore
Available feature functions:
BleuScoreFeature ConstrainedDecoding ControlRecombination CountNonTerms CoveredReferenceFeature Distortion ExternalFeature Generation GlobalLexicalModel HyperParameterAsWeight InputFeature KENLM LexicalReordering MaxSpanFreeNonTermSource NieceTerminal OpSequenceModel PhraseBoundaryFeature PhraseDictionaryALSuffixArray PhraseDictionaryBinary PhraseDictionaryDynSuffixArray PhraseDictionaryFuzzyMatch PhraseDictionaryMemory PhraseDictionaryMultiModel PhraseDictionaryMultiModelCounts PhraseDictionaryOnDisk PhraseDictionaryScope3 PhraseDictionaryTransliteration PhraseLengthFeature PhrasePairFeature PhrasePenalty ReferenceComparison RuleScope SetSourcePhrase SkeletonChangeInput SkeletonLM SkeletonPT SkeletonStatefulFF SkeletonStatelessFF SoftMatchingFeature SoftSourceSyntacticConstraintsFeature SourceGHKMTreeInputMatchFeature SourceWordDeletionFeature SpanLength SparseHieroReorderingFeature SyntaxRHS TargetBigramFeature TargetNgramFeature TargetWordInsertionFeature TreeStructureFeature UnknownWordPenalty WordPenalty WordTranslationFeature 

 

Moses翻译过程中的参数,程序运行弹出的列表,记录在这了

标签:des   style   blog   http   color   os   io   for   ar   

原文地址:http://www.cnblogs.com/hitnoah/p/3935203.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!