gec_metrics.metrics.scribendi module
- class gec_metrics.metrics.scribendi.Scribendi(config: Config = None)[source]
Bases:
MetricBaseForReferenceFree- class Config(model: str = 'gpt2', threshold: float = 0.8, no_cuda: bool = False, batch_size: int = 32)[source]
Bases:
ConfigScribendi configuration. - model (str): Model id of a language model. - threshold (float): Threshold for the maximum values of
the token sort ratio and the levenshtein distance ratio.
no_cuda (bool): If True, work on CPU.
batch_size (int): Batch size for the inference.
- batch_size: int = 32
- model: str = 'gpt2'
- no_cuda: bool = False
- threshold: float = 0.8
- levenshtein_distance_ratio(src: str, pred: str) float[source]
The word-level levenshtein distance ratio.
- Parameters:
src (str) – The source sentence.
pred (str) – The corrected sentence.
- Returns:
The levelshtein distance ratio.
- Return type:
float
- ppl(sents: list[str]) list[float][source]
Compute perplexity using a LM.
- Parameters:
sents (list[str]) – The sentences to be computed the perplexity.
- Returns:
The list of perplexity.
- Return type:
list[float]
- score_corpus(sources: list[str], hypotheses: list[str]) float[source]
Calculate a corpus-level score.
- Parameters:
sources (list[str]) – Source sentence. The shape is (num_sentences, )
hypotheses (list[str]) – Corrected sentences. The shape is (num_sentences, )
- Returns:
The corpus-level score.
- Return type:
float
- score_sentence(sources: list[str], hypotheses: list[str]) list[float][source]
Calculate sentence-level scores.
- Parameters:
sources (list[str]) – Source sentence. The shape is (num_sentences, )
hypotheses (list[str]) – Corrected sentences. The shape is (num_sentences, )
- Returns:
The sentence-level scores.
- Return type:
list[float]