gec_metrics.metrics.errant module
- class gec_metrics.metrics.errant.ERRANT(config: Config = None)[source]
Bases:
MetricBaseForReferenceBased- class Config(beta: float = 0.5, language: str = 'en')[source]
Bases:
ConfigERRANT configuration. - beta (float): The beta for F-beta score. - language (str): The language for spacy.
- beta: float = 0.5
- language: str = 'en'
- aggregate_to_overall(scores: dict[str, Score]) Score[source]
Convert error type-wise scores into an overall score.
- Parameters:
scores (dict[str, "Score"]) – Error type-wise scores.
- Returns:
The aggregated score.
- Return type:
- cached_parse(sent: str) Doc[source]
Efficient parse() by caching.
- Parameters:
sent (str) – The sentence to be parsed.
- Returns:
The parse results.
- Return type:
spacy.tokens.doc.Doc
- edit_extraction(src: str, trg: str) list[Edit][source]
Extract edits given a source and a corrected sentence.
- Parameters:
src (str) – The source sentence.
trg (str) – The corrected sentence.
- Returns:
Extracted edits.
- Return type:
list[errant.edit.Edit]
- score_base(sources: list[str], hypotheses: list[str], references: list[list[str]]) list[list[dict[str, Score]]][source]
- Calculate scores while retaining sentence and reference boundaries.
- The results can be aggregated according to the purpose,
e.g., at sentence-level or corpus-level.
- Parameters:
sources (list[str]) – Source sentence.
hypothesis (list[str]) – Corrected sentences.
references (list[list[str]]) – Reference sentences. The shape is (the number of references, the number of sentences).
- Returns:
- The verbose scores.
The list shape is (num_sents, num_refs)
The dict contains error type-wise scores.
- Return type:
list[list[dict[str, “Score”]]]
- score_corpus(sources: list[str], hypotheses: list[str], references: list[list[str]]) float[source]
Calculate a corpus-level score. This accumulates edit count for TP, FP, FN
and calculates f-beta score.
- Parameters:
sources (list[str]) – Source sentence. The shape is (num_sentences, )
hypotheses (list[str]) – Corrected sentences. The shape is (num_sentences, )
references (list[list[str]]) – Reference sentences. The shape is (num_references, num_sentences).
- Returns:
The corpus-level score.
- Return type:
float
- score_corpus_etype(sources: list[str], hypotheses: list[str], references: list[list[str]], cat: int = 2)[source]
Calculate error-type-level scores.
- Parameters:
sources (list[str]) – Source sentence. The shape is (num_sentences, )
hypotheses (list[str]) – Corrected sentences. The shape is (num_sentences, )
references (list[list[str]]) – Reference sentences. The shape is (num_references, num_sentences).
cat (int) –
Error type category. By following the original ERRANT,
cat=1: Operation: e.g. M, R, U cat=2: Main types: e.g. NOUN, VERB cat=3: All types: e.g. M:NOUN, R:VERB
- Returns:
- The error-type-level score.
Each key is an error type, and value is the Score instance.
- Return type:
dict[str, Score]
- score_corpus_verbose(sources: list[str], hypotheses: list[str], references: list[list[str]]) Score[source]
Calculate a corpus level score by aggregating verbose scores.
- Parameters:
sources (list[str]) – Source sentence.
hypothesis (list[str]) – Corrected sentences.
references (list[list[str]]) – Reference sentences. The shape is (the number of references, the number of sentences).
- Returns:
It contains TP, FP, FN, Precision, Recall, and F-beta.
- Return type:
- score_sentence(sources: list[str], hypotheses: list[str], references: list[list[str]]) list[float][source]
Calculate sentence-level scores.
- Parameters:
sources (list[str]) – Source sentence. The shape is (num_sentences, )
hypotheses (list[str]) – Corrected sentences. The shape is (num_sentences, )
references (list[list[str]]) – Reference sentences. The shape is (num_references, num_sentences).
- Returns:
The sentence-level scores.
- Return type:
list[float]
- score_sentence_verbose(sources: list[str], hypotheses: list[str], references: list[list[str]]) list[Score][source]
Calculate sentence level scores by aggregating verbose scores. “verbose” means that TP, FP, FN, Precisoin, Recall, and F are available.
- Parameters:
sources (list[str]) – Source sentence. The shape is (num_sentences, )
hypotheses (list[str]) – Corrected sentences. The shape is (num_sentences, )
references (list[list[str]]) – Reference sentences. The shape is (num_references, num_sentences).
- Returns:
The sentence-level scores.
- Return type:
list[Score]