gec_metrics.metrics.gleu module

class gec_metrics.metrics.gleu.GLEU(config: Config = None)[source]

Bases: GREEN

GLEU implemented using GREEN reformulation (https://aclanthology.org/2024.inlg-main.25.pdf).

class Config(iter: int = 500, n: int = 4, unit: str = 'word')[source]

Bases: Config

GLEU configuration. :param - iter: The number of iterations. :type - iter: int :param - n: The maximum n of n-gram. :type - n: int :param - unit: Word-level or character-level. Can be ‘word’ or ‘char’. :type - unit: str

iter: int = 500
n: int = 4
unit: str = 'word'
aggregate_score(scores: list[Score], hyp_len: int, ref_len: int) float[source]

Aggregate n-gram scores to an overall score by the geometric mean.

Parameters:
  • scores (list[Score]) – The scores keeping n-gram boundary. The shape is (n, )

  • hyp_len (int) – The length of the hypothesis.

  • ref_len (int) – The length of the reference.

Returns:

The aggregated score.

Return type:

float

score_base(sources: list[str], hypotheses: list[str], references: list[list[str]]) float[source]
Compute True Positive and False Negative using GREEN’s reformulation.

(https://aclanthology.org/2024.inlg-main.25.pdf)

The actual equation is (TI + TK - UD) / (TI + TK + OI + UD),

thus we regard - True Positive (TP) as TI + TK - UD, - False Positive (FP) as OI + 2*UD.

Finally, precision = TP / (TP+FP) will be the GLEU score.

Parameters:
  • sources (list[str]) – Source sentence.

  • hypothesis (list[str]) – Corrected sentences.

  • references (list[list[str]]) – Reference sentences. The shape is (the number of references, the number of sentences).

Returns:

The verbose scores.

The shape is (num_iterations, num_sents, max_ngram).

list[list[int]]: The length for the hypotheses.

The shape is (num_iterations, num_sents)

list[list[int]]: The length for the references.

The shape is (num_iterations, num_sents)

Return type:

list[list[list[“Score”]]]

score_corpus(sources: list[str], hypotheses: list[str], references: list[list[str]]) float[source]

Calculate a corpus-level score.

Parameters:
  • sources (list[str]) – Source sentence. The shape is (num_sentences, )

  • hypotheses (list[str]) – Corrected sentences. The shape is (num_sentences, )

  • references (list[list[str]]) – Reference sentences. The shape is (num_references, num_sentences).

Returns:

The corpus-level score.

Return type:

float

score_sentence(sources: list[str], hypotheses: list[str], references: list[list[str]]) float[source]

Calculate sentence-level scores.

Parameters:
  • sources (list[str]) – Source sentence. The shape is (num_sentences, )

  • hypotheses (list[str]) – Corrected sentences. The shape is (num_sentences, )

  • references (list[list[str]]) – Reference sentences. The shape is (num_references, num_sentences).

Returns:

The sentence-level scores.

Return type:

list[float]

class gec_metrics.metrics.gleu.GLEUOfficial(config: Config = None)[source]

Bases: GLEU

score_base(sources: list[str], hypotheses: list[str], references: list[list[str]]) Tuple[list[list[list[Score]]], list[list[int]], list[list[int]]][source]
The official implementation contains an error

where the frequency of n-grams is ignored in the calculation of SR.

As a result, when an n-gram is classified into both TK and UD,

it is entirely counted as TK.

Parameters:
  • sources (list[str]) – Source sentence.

  • hypothesis (list[str]) – Corrected sentences.

  • references (list[list[str]]) – Reference sentences. The shape is (the number of references, the number of sentences).

Returns:

The verbose scores.

The shape is (num_iterations, num_sents, max_ngram).

list[list[int]]: The length for the hypotheses.

The shape is (num_iterations, num_sents)

list[list[int]]: The length for the references.

The shape is (num_iterations, num_sents)

Return type:

list[list[list[“Score”]]]