gec_metrics.metrics.scribendi module

class gec_metrics.metrics.scribendi.Scribendi(config: Config = None)[source]

Bases: MetricBaseForReferenceFree

class Config(model: str = 'gpt2', threshold: float = 0.8, no_cuda: bool = False, batch_size: int = 32)[source]

Bases: Config

Scribendi configuration. - model (str): Model id of a language model. - threshold (float): Threshold for the maximum values of

the token sort ratio and the levenshtein distance ratio.

  • no_cuda (bool): If True, work on CPU.

  • batch_size (int): Batch size for the inference.

batch_size: int = 32
model: str = 'gpt2'
no_cuda: bool = False
threshold: float = 0.8
levenshtein_distance_ratio(src: str, pred: str) float[source]

The word-level levenshtein distance ratio.

Parameters:
  • src (str) – The source sentence.

  • pred (str) – The corrected sentence.

Returns:

The levelshtein distance ratio.

Return type:

float

ppl(sents: list[str]) list[float][source]

Compute perplexity using a LM.

Parameters:

sents (list[str]) – The sentences to be computed the perplexity.

Returns:

The list of perplexity.

Return type:

list[float]

score_corpus(sources: list[str], hypotheses: list[str]) float[source]

Calculate a corpus-level score.

Parameters:
  • sources (list[str]) – Source sentence. The shape is (num_sentences, )

  • hypotheses (list[str]) – Corrected sentences. The shape is (num_sentences, )

Returns:

The corpus-level score.

Return type:

float

score_sentence(sources: list[str], hypotheses: list[str]) list[float][source]

Calculate sentence-level scores.

Parameters:
  • sources (list[str]) – Source sentence. The shape is (num_sentences, )

  • hypotheses (list[str]) – Corrected sentences. The shape is (num_sentences, )

Returns:

The sentence-level scores.

Return type:

list[float]

token_sort_ratio(src: str, pred: str) float[source]
Parameters:
  • src (str) – The source sentence.

  • pred (str) – The corrected sentence.

Returns:

The token sort ratio.

Return type:

float