gec_metrics.metrics.llm_kobayashi24 module
- class gec_metrics.metrics.llm_kobayashi24.LLMKobayashi24(config: Config = None)[source]
Bases:
MetricBaseForReferenceFree- class Config(model: str = None, cache: str = None, seed: int = 777, verbose: bool = False, criteria: str = None, instruction_template: str = 'The goal of this task is to rank the presented targets based on the quality of the sentences.\nAfter reading the source sentence and target sentences, please assign a score from a minimum of 1 point to a maximum of 5 points to each target based on the quality of the sentence (note that you can assign the same score multiple times).\n\n# source\n[SOURCE]\n\n# targets\n[CORRECTION]\n\n# output format\nThe output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":\n```json\n{\n"target1_score": int // assigned score for target 1\n...\n"targetN_score": int // assigned score for target N\n}\n```\n')[source]
Bases:
ConfigLLMKobayashi24** configuration.
- Parameters:
model (str) – The name of the LLM. This is typically the ID of a Hugging Face model or an OpenAI model.
cache (str) – Filename for caching model inputs and outputs.
seed (int) – Seed value.
criteria (str) – Specifies the evaluation aspect. Can be one of None, ‘grammaticality’, ‘fluency’, or ‘meaning’.
instruction_template (str) – Template for the instruction. Must include [SOURCE] and [CORRECTION] placeholders.
- cache: str = None
- criteria: str = None
- instruction_template: str = 'The goal of this task is to rank the presented targets based on the quality of the sentences.\nAfter reading the source sentence and target sentences, please assign a score from a minimum of 1 point to a maximum of 5 points to each target based on the quality of the sentence (note that you can assign the same score multiple times).\n\n# source\n[SOURCE]\n\n# targets\n[CORRECTION]\n\n# output format\nThe output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":\n```json\n{\n"target1_score": int // assigned score for target 1\n...\n"targetN_score": int // assigned score for target N\n}\n```\n'
- model: str = None
- seed: int = 777
- verbose: bool = False
- class LLMSentOutputFormat1(*, target_score1: int)[source]
Bases:
BaseModel- model_config = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- target_score1: int
- class LLMSentOutputFormat2(*, target_score1: int, target_score2: int)[source]
Bases:
BaseModel- model_config = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- target_score1: int
- target_score2: int
- class LLMSentOutputFormat3(*, target_score1: int, target_score2: int, target_score3: int)[source]
Bases:
BaseModel- model_config = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- target_score1: int
- target_score2: int
- target_score3: int
- class LLMSentOutputFormat4(*, target_score1: int, target_score2: int, target_score3: int, target_score4: int)[source]
Bases:
BaseModel- model_config = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- target_score1: int
- target_score2: int
- target_score3: int
- target_score4: int
- class LLMSentOutputFormat5(*, target_score1: int, target_score2: int, target_score3: int, target_score4: int, target_score5: int)[source]
Bases:
BaseModel- model_config = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- target_score1: int
- target_score2: int
- target_score3: int
- target_score4: int
- target_score5: int
- abstractmethod call_client(instruction: str, output_format: BaseModel)[source]
Write forward scripts given instruction. You can refer output format.
- hyp_form(src: str, hyp: str) str[source]
This is used for chaning format of the hypothsis, e.g., edit representation. :param src: Source sentence. :type src: str :param hyp: Hypothesis sentence. :type hyp: str
- Return
str: Another representation of the hypothesis.
- index_multiple(elems: list, target_e) list[int][source]
Multiple version of list.index()
- Parameters:
elems (list[any]) – The list containing any values.
target_e (any) – Value for which you want to know the index.
- Returns:
Indices of target_e in elems.
- Return type:
list[int]
- abstractmethod load_client()[source]
This function loads LLM client, e.g. OpenAI() or .from_pretrained() forHuggingface model.
- output_formats = [<class 'gec_metrics.metrics.llm_kobayashi24.LLMKobayashi24.LLMSentOutputFormat1'>, <class 'gec_metrics.metrics.llm_kobayashi24.LLMKobayashi24.LLMSentOutputFormat2'>, <class 'gec_metrics.metrics.llm_kobayashi24.LLMKobayashi24.LLMSentOutputFormat3'>, <class 'gec_metrics.metrics.llm_kobayashi24.LLMKobayashi24.LLMSentOutputFormat4'>, <class 'gec_metrics.metrics.llm_kobayashi24.LLMKobayashi24.LLMSentOutputFormat5'>]
- sample_sentences(hypotheses: list[str], max_n: int = 5) list[str][source]
Sample max_n sentences from hypotheses. LLMKobayashi24** metrics receives the hypotheses up to five. So if the number of distinct hypotheses is larger than five, we need to sample five sentences. In this implementation, we employ simple strategy: choose the five hypothesis from high frequency.
- Parameters:
hypotheses (list[str]) – Hypotheses of all systems.
max_n (int) – The number of sentences to be sampled.
- Returns:
The sampled sentences.
- Return type:
list[str]
- score_pairwise(sources: list[str], hypotheses: list[list[str]])[source]
Calculate pairwise scores for all of combinations of hypotheses. By default, it simply compares the sentence-level scores.
- Parameters:
sources (list[str]) – Source sentence. The shape is (num_sentences, )
hypotheses (list[list[str]]) – Corrected sentences. The shape is (num_systems, num_sentences).
references (list[list[str]]) – Reference sentences. The shape is (num_references, num_sentences).
- Returns:
- Pairwise comparison resutls.
The shape is (num_sentences, num_systems, num_systems). Each element is -1, 0, or 1:
0 : tie 1 : sys_id1 wins sys_id2 -1: sys_id1 loses sys_id2
- Return type:
list[list[list]]
- score_sentence(sources: list[str], hypotheses: list[str])[source]
Calculate a sentence-level scores.
- Parameters:
sources (list[str]) – Source sentence. The shape is (num_sentences, )
hypotheses (list[str]) – Corrected sentences. The shape is (num_sentences, )
- Returns:
The sentence-level scores.
- Return type:
list[float]
- class gec_metrics.metrics.llm_kobayashi24.LLMKobayashi24HFEdit(config: Config = None)[source]
Bases:
LLMKobayashi24HFSentLLM-E with huggingface models.
- class Config(model: str = 'meta-llama/Llama-2-13b-chat-hf', cache: str = None, seed: int = 777, verbose: bool = False, criteria: str = None, instruction_template: str = 'The goal of this task is to rank the presented targets based on the quality of the sentences.\nAfter reading the source sentence and target sentences, please assign a score from a minimum of 1 point to a maximum of 5 points to each target based on the quality of the sentence (note that you can assign the same score multiple times).\nFor targets without any edits, if the sentence is correct, they will be awarded 5 points; if there is an error, they will receive 1 point.\nThe edits in each target are indicated as follows:\nInsert "the": [→the]\nDelete "the": [the→]\nReplace "the" with "a": [the→a]\n\n# context\n[SOURCE]\n\n# targets\n[CORRECTION]\n\n# output format\nThe output should be a markdown code snippet formatted in the\nfollowing schema, including the leading and trailing "```json" and "```":\n```json\n{\n"target1_score": int // assigned score for target 1\n...\n"targetN_score": int // assigned score for target N\n}\n```\n', quantization: str = '8bit', dtype: str = 'bfloat16')[source]
Bases:
Config- instruction_template: str = 'The goal of this task is to rank the presented targets based on the quality of the sentences.\nAfter reading the source sentence and target sentences, please assign a score from a minimum of 1 point to a maximum of 5 points to each target based on the quality of the sentence (note that you can assign the same score multiple times).\nFor targets without any edits, if the sentence is correct, they will be awarded 5 points; if there is an error, they will receive 1 point.\nThe edits in each target are indicated as follows:\nInsert "the": [→the]\nDelete "the": [the→]\nReplace "the" with "a": [the→a]\n\n# context\n[SOURCE]\n\n# targets\n[CORRECTION]\n\n# output format\nThe output should be a markdown code snippet formatted in the\nfollowing schema, including the leading and trailing "```json" and "```":\n```json\n{\n"target1_score": int // assigned score for target 1\n...\n"targetN_score": int // assigned score for target N\n}\n```\n'
- class gec_metrics.metrics.llm_kobayashi24.LLMKobayashi24HFSent(config=None)[source]
Bases:
LLMKobayashi24LLM-S with huggingface models.
- class Config(model: str = 'meta-llama/Llama-2-13b-chat-hf', cache: str = None, seed: int = 777, verbose: bool = False, criteria: str = None, instruction_template: str = 'The goal of this task is to rank the presented targets based on the quality of the sentences.\nAfter reading the source sentence and target sentences, please assign a score from a minimum of 1 point to a maximum of 5 points to each target based on the quality of the sentence (note that you can assign the same score multiple times).\n\n# source\n[SOURCE]\n\n# targets\n[CORRECTION]\n\n# output format\nThe output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":\n```json\n{\n"target1_score": int // assigned score for target 1\n...\n"targetN_score": int // assigned score for target N\n}\n```\n', quantization: str = '8bit', dtype: str = 'bfloat16')[source]
Bases:
ConfigHugging Face configuration.
- Parameters:
model (str) – Hugging Face Model name.
quantization (str) – Quantization setting. None, ‘4bit’ or ‘8bit’.
dtype (str)
- dtype: str = 'bfloat16'
- model: str = 'meta-llama/Llama-2-13b-chat-hf'
- quantization: str = '8bit'
- class gec_metrics.metrics.llm_kobayashi24.LLMKobayashi24OpenAIEdit(config: Config = None)[source]
Bases:
LLMKobayashi24OpenAISentLLM-E with OpenAI models.
- class Config(model: str = 'gpt-4o-mini-2024-07-18', cache: str = None, seed: int = 777, verbose: bool = False, criteria: str = None, instruction_template: str = 'The goal of this task is to rank the presented targets based on the quality of the sentences.\nAfter reading the source sentence and target sentences, please assign a score from a minimum of 1 point to a maximum of 5 points to each target based on the quality of the sentence (note that you can assign the same score multiple times).\nFor targets without any edits, if the sentence is correct, they will be awarded 5 points; if there is an error, they will receive 1 point.\nThe edits in each target are indicated as follows:\nInsert "the": [→the]\nDelete "the": [the→]\nReplace "the" with "a": [the→a]\n\n# context\n[SOURCE]\n\n# targets\n[CORRECTION]\n\n# output format\nThe output should be a markdown code snippet formatted in the\nfollowing schema, including the leading and trailing "```json" and "```":\n```json\n{\n"target1_score": int // assigned score for target 1\n...\n"targetN_score": int // assigned score for target N\n}\n```\n', organization: str = None, api_key: str = None, base_url: str = None)[source]
Bases:
Config- instruction_template: str = 'The goal of this task is to rank the presented targets based on the quality of the sentences.\nAfter reading the source sentence and target sentences, please assign a score from a minimum of 1 point to a maximum of 5 points to each target based on the quality of the sentence (note that you can assign the same score multiple times).\nFor targets without any edits, if the sentence is correct, they will be awarded 5 points; if there is an error, they will receive 1 point.\nThe edits in each target are indicated as follows:\nInsert "the": [→the]\nDelete "the": [the→]\nReplace "the" with "a": [the→a]\n\n# context\n[SOURCE]\n\n# targets\n[CORRECTION]\n\n# output format\nThe output should be a markdown code snippet formatted in the\nfollowing schema, including the leading and trailing "```json" and "```":\n```json\n{\n"target1_score": int // assigned score for target 1\n...\n"targetN_score": int // assigned score for target N\n}\n```\n'
- class gec_metrics.metrics.llm_kobayashi24.LLMKobayashi24OpenAISent(config: Config = None)[source]
Bases:
LLMKobayashi24LLM-S with OpenAI models.
- class Config(model: str = 'gpt-4o-mini-2024-07-18', cache: str = None, seed: int = 777, verbose: bool = False, criteria: str = None, instruction_template: str = 'The goal of this task is to rank the presented targets based on the quality of the sentences.\nAfter reading the source sentence and target sentences, please assign a score from a minimum of 1 point to a maximum of 5 points to each target based on the quality of the sentence (note that you can assign the same score multiple times).\n\n# source\n[SOURCE]\n\n# targets\n[CORRECTION]\n\n# output format\nThe output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":\n```json\n{\n"target1_score": int // assigned score for target 1\n...\n"targetN_score": int // assigned score for target N\n}\n```\n', organization: str = None, api_key: str = None, base_url: str = None)[source]
Bases:
ConfigOpenAI configuration.
- Parameters:
model (str) – Model name.
organization (str) – Your organization key.
api_key (str) – Your api key.
base_url (str) – When using Gemini models, specify an appropriate url.
- api_key: str = None
- base_url: str = None
- model: str = 'gpt-4o-mini-2024-07-18'
- organization: str = None