gec_metrics.metrics.impara module

class gec_metrics.metrics.impara.IMPARA(config: Config = None)[source]

Bases: MetricBaseForReferenceFree

class Config(model_qe: str = 'gotutiyan/IMPARA-QE', model_se: str = 'google-bert/bert-base-cased', pooling: str = 'cls', max_length: int = 128, threshold: float = 0.9, no_cuda: bool = False, batch_size: int = 32)[source]

Bases: Config

IMPARA configuration.

Parameters:
  • model_qe (str) – Quality estimation model.

  • model_se (str) – Similarity estimation model.

  • pooling (str) – Pooling method. ‘cls’ or ‘mean’.

  • max_length (int) – Maximum length of inputs.

  • threshold (float) – Threshold for the similarity score.

  • no_cuda (bool) – If True, work on CPU.

  • batch_size (int) – Batch size for the inference.

batch_size: int = 32
max_length: int = 128
model_qe: str = 'gotutiyan/IMPARA-QE'
model_se: str = 'google-bert/bert-base-cased'
no_cuda: bool = False
pooling: str = 'cls'
threshold: float = 0.9
score_sentence(sources: list[str], hypotheses: list[str]) list[float][source]

Calculate sentence-level scores.

Parameters:
  • sources (list[str]) – Source sentence. The shape is (num_sentences, )

  • hypotheses (list[str]) – Corrected sentences. The shape is (num_sentences, )

Returns:

The sentence-level scores.

Return type:

list[float]

score_sentence_qe(hypotheses: list[str]) list[float][source]

Compute quality scores.

Parameters:

hypotheses (list[str]) – Corrected sentences. The shape is (num_sentences, )

Returns:

The quality scores.

Return type:

list[float]

score_sentence_se(sources: list[str], hypotheses: list[str]) list[float][source]

Compute similarity scores.

Parameters:
  • sources (list[str]) – Source sentence. The shape is (num_sentences, )

  • hypotheses (list[str]) – Corrected sentences. The shape is (num_sentences, )

Returns:

The similarity scores.

Return type:

list[float]

class gec_metrics.metrics.impara.SimilarityEstimator(config: Config = None)[source]

Bases: MetricBaseForReferenceFree

class Config(model: str = 'google-bert/bert-base-cased', batch_size: int = 32, max_length: int = 128, no_cuda: bool = False)[source]

Bases: Config

Similarity Estimator configuration.

Parameters:
  • model (str) – Model name to compute similarity.

  • batch_size (int) – Batch size during inference.

  • max_length (int) – Maximum length in tokenization. The input is truncated if longer than it.

  • no_cuda (bool) – If True, it will work on CPU.

batch_size: int = 32
max_length: int = 128
model: str = 'google-bert/bert-base-cased'
no_cuda: bool = False
property device
forward(src_input_ids: Tensor, src_attention_mask: Tensor, pred_input_ids: Tensor, pred_attention_mask: Tensor) Tensor[source]

Compute the cosine similarity given source and corrected sentences.

Parameters:
  • src_input_ids (torch.Tensor) – Tokenized source sentences. The shape is (num_batch, sequence_length)

  • src_attention_mask (torch.Tensor) – The attention mask to handle padding. The shape is (num_batch, sequence_length)

  • pred_input_ids (torch.Tensor) – Tokenized corrected sentences. The shape is (num_batch, sequence_length)

  • pred_attention_mask (torch.Tensor) – The attention mask to handle padding. The shape is (num_batch, sequence_length)

Returns:

The cosine similarity.

The shape is (num_batch, )

Return type:

torch.Tensor

mean_pooling(states: Tensor, mask: Tensor) Tensor[source]

Compute mean pooling. Only the representaion with mask==1 are used.

Parameters:
  • states (torch.Tensor) – The token-level representation. The shape is (num_batch, sequence_length, hidden_size)

  • mask – torch.Tensor: The mask indicates padding or not. The shape is (num_batch, sequence_length)

Returns:

The mean pooled representation.

The shape is (num_batch, hidden_size)

Return type:

torch.Tensor

score_sentence(sources: list[str], hypotheses: list[str]) list[float][source]

Compute similarity scores.

Parameters:
  • sources (list[str]) – Source sentence. The shape is (num_sentences, )

  • hypotheses (list[str]) – Corrected sentences. The shape is (num_sentences, )

Returns:

The similarity scores.

Return type:

list[float]