gec_metrics.metrics.gleu module
- class gec_metrics.metrics.gleu.GLEU(config: Config = None)[source]
Bases:
GREENGLEU implemented using GREEN reformulation (https://aclanthology.org/2024.inlg-main.25.pdf).
- class Config(iter: int = 500, n: int = 4, unit: str = 'word')[source]
Bases:
ConfigGLEU configuration. :param - iter: The number of iterations. :type - iter: int :param - n: The maximum n of n-gram. :type - n: int :param - unit: Word-level or character-level. Can be ‘word’ or ‘char’. :type - unit: str
- iter: int = 500
- n: int = 4
- unit: str = 'word'
- aggregate_score(scores: list[Score], hyp_len: int, ref_len: int) float[source]
Aggregate n-gram scores to an overall score by the geometric mean.
- Parameters:
scores (list[Score]) – The scores keeping n-gram boundary. The shape is (n, )
hyp_len (int) – The length of the hypothesis.
ref_len (int) – The length of the reference.
- Returns:
The aggregated score.
- Return type:
float
- score_base(sources: list[str], hypotheses: list[str], references: list[list[str]]) float[source]
- Compute True Positive and False Negative using GREEN’s reformulation.
(https://aclanthology.org/2024.inlg-main.25.pdf)
- The actual equation is (TI + TK - UD) / (TI + TK + OI + UD),
thus we regard - True Positive (TP) as TI + TK - UD, - False Positive (FP) as OI + 2*UD.
Finally, precision = TP / (TP+FP) will be the GLEU score.
- Parameters:
sources (list[str]) – Source sentence.
hypothesis (list[str]) – Corrected sentences.
references (list[list[str]]) – Reference sentences. The shape is (the number of references, the number of sentences).
- Returns:
- The verbose scores.
The shape is (num_iterations, num_sents, max_ngram).
- list[list[int]]: The length for the hypotheses.
The shape is (num_iterations, num_sents)
- list[list[int]]: The length for the references.
The shape is (num_iterations, num_sents)
- Return type:
list[list[list[“Score”]]]
- score_corpus(sources: list[str], hypotheses: list[str], references: list[list[str]]) float[source]
Calculate a corpus-level score.
- Parameters:
sources (list[str]) – Source sentence. The shape is (num_sentences, )
hypotheses (list[str]) – Corrected sentences. The shape is (num_sentences, )
references (list[list[str]]) – Reference sentences. The shape is (num_references, num_sentences).
- Returns:
The corpus-level score.
- Return type:
float
- score_sentence(sources: list[str], hypotheses: list[str], references: list[list[str]]) float[source]
Calculate sentence-level scores.
- Parameters:
sources (list[str]) – Source sentence. The shape is (num_sentences, )
hypotheses (list[str]) – Corrected sentences. The shape is (num_sentences, )
references (list[list[str]]) – Reference sentences. The shape is (num_references, num_sentences).
- Returns:
The sentence-level scores.
- Return type:
list[float]
- class gec_metrics.metrics.gleu.GLEUOfficial(config: Config = None)[source]
Bases:
GLEU- score_base(sources: list[str], hypotheses: list[str], references: list[list[str]]) Tuple[list[list[list[Score]]], list[list[int]], list[list[int]]][source]
- The official implementation contains an error
where the frequency of n-grams is ignored in the calculation of SR.
- As a result, when an n-gram is classified into both TK and UD,
it is entirely counted as TK.
- Parameters:
sources (list[str]) – Source sentence.
hypothesis (list[str]) – Corrected sentences.
references (list[list[str]]) – Reference sentences. The shape is (the number of references, the number of sentences).
- Returns:
- The verbose scores.
The shape is (num_iterations, num_sents, max_ngram).
- list[list[int]]: The length for the hypotheses.
The shape is (num_iterations, num_sents)
- list[list[int]]: The length for the references.
The shape is (num_iterations, num_sents)
- Return type:
list[list[list[“Score”]]]