gec_metrics.metrics.scribendi module

class gec_metrics.metrics.scribendi.Scribendi(config: Config = None)[source]

class Config(model: str = 'gpt2', threshold: float = 0.8, no_cuda: bool = False, batch_size: int = 32)[source]

Scribendi configuration. - model (str): Model id of a language model. - threshold (float): Threshold for the maximum values of

the token sort ratio and the levenshtein distance ratio.

levenshtein_distance_ratio(src: str, pred: str) → float[source]

The word-level levenshtein distance ratio.

Parameters:

Returns:

The levelshtein distance ratio.

Return type:

float

ppl(sents: list[str]) → list[float][source]

Compute perplexity using a LM.

Parameters:: sents (list[str]) – The sentences to be computed the perplexity.
Returns:: The list of perplexity.
Return type:: list[float]

score_corpus(sources: list[str], hypotheses: list[str]) → float[source]

Calculate a corpus-level score.

Parameters:

sources (list[str]) – Source sentence. The shape is (num_sentences, )
hypotheses (list[str]) – Corrected sentences. The shape is (num_sentences, )

Returns:

The corpus-level score.

Return type:

float

score_sentence(sources: list[str], hypotheses: list[str]) → list[float][source]

Calculate sentence-level scores.

Parameters:

sources (list[str]) – Source sentence. The shape is (num_sentences, )
hypotheses (list[str]) – Corrected sentences. The shape is (num_sentences, )

Returns:

The sentence-level scores.

Return type:

list[float]

token_sort_ratio(src: str, pred: str) → float[source]

Parameters:

Returns:

The token sort ratio.

Return type:

float