The pixel-linguist's discuss from gowitheflow-1998

Reproduction question

The STS-B is about 0.7371, I made it work on both mteb lib and your eval_sts py.

Building models for Pixel-Linguist/Pixel-Linguist-v0
model type: pixel
Some weights of the model checkpoint at Pixel-Linguist/Pixel-Linguist-v0 were not used when initializing PIXELForRepresentation: ['pooler.linear.bias', 'pooler.ln.weight', 'pooler.linear.weight', 'pooler.ln.bias']
- This IS expected if you are initializing PIXELForRepresentation from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing PIXELForRepresentation from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
100% 87/87 [00:12<00:00,  7.08it/s]
spearman all languages: [0.7371328443627424]
anisotropy all languages: [0.207]

It seems about 4 point lower that the digit in your paper.

INFO:pixel.data.rendering.rendering_utils:loading text renderer configuration file ./Pixel-Linguist-v0/text_renderer_config.json from cache at /content/Pixel-Linguist/Pixel-Linguist-v0/text_renderer_config.json
./Pixel-Linguist-v0/renderer.renderer
INFO:pixel.data.rendering.rendering_utils:loading font file ./Pixel-Linguist-v0/renderer.renderer from cache at /content/Pixel-Linguist/data/fallback_fonts/GoNotoCurrent.ttf
INFO:pixel.data.rendering.pangocairo_renderer:Loading font from /content/Pixel-Linguist/data/fallback_fonts/GoNotoCurrent.ttf

Running task:  SICK-R
INFO:mteb.evaluation.MTEB:

## Evaluating 1 tasks:
───────────────────────────────────────── Selected tasks  ──────────────────────────────────────────
STS
    - SICK-R, s2s


INFO:mteb.evaluation.MTEB:

********************** Evaluating SICK-R **********************
INFO:mteb.evaluation.MTEB:Loading dataset for SICK-R
INFO:mteb.abstasks.AbsTaskSTS:
Task: SICK-R, split: test. Running...
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 9927 sentences1...
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:962: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 9927 sentences2...
INFO:mteb.evaluation.evaluators.STSEvaluator:Evaluating...
INFO:mteb.evaluation.MTEB:Evaluation for SICK-R on test took 82.60 seconds
INFO:mteb.evaluation.MTEB:Scores: {'cos_sim': {'pearson': 0.7781435725561205, 'spearman': 0.693026233861837}, 'manhattan': {'pearson': 0.7458007547441067, 'spearman': 0.6917351042169897}, 'euclidean': {'pearson': 0.74647644146939, 'spearman': 0.6930262498681216}, 'evaluation_time': 82.6}
Running task:  STS12
INFO:mteb.evaluation.MTEB:

## Evaluating 1 tasks:
───────────────────────────────────────── Selected tasks  ──────────────────────────────────────────
STS
    - STS12, s2s


INFO:mteb.evaluation.MTEB:

********************** Evaluating STS12 **********************
INFO:mteb.evaluation.MTEB:Loading dataset for STS12
INFO:mteb.abstasks.AbsTaskSTS:
Task: STS12, split: test. Running...
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 3108 sentences1...
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:962: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 3108 sentences2...
INFO:mteb.evaluation.evaluators.STSEvaluator:Evaluating...
INFO:mteb.evaluation.MTEB:Evaluation for STS12 on test took 26.10 seconds
INFO:mteb.evaluation.MTEB:Scores: {'cos_sim': {'pearson': 0.8148581693135399, 'spearman': 0.7350674849787033}, 'manhattan': {'pearson': 0.7817916408847375, 'spearman': 0.7352018056406955}, 'euclidean': {'pearson': 0.7811356917848975, 'spearman': 0.7350674571026258}, 'evaluation_time': 26.1}
Running task:  STS13
INFO:mteb.evaluation.MTEB:

## Evaluating 1 tasks:
───────────────────────────────────────── Selected tasks  ──────────────────────────────────────────
STS
    - STS13, s2s


INFO:mteb.evaluation.MTEB:

********************** Evaluating STS13 **********************
INFO:mteb.evaluation.MTEB:Loading dataset for STS13
INFO:mteb.abstasks.AbsTaskSTS:
Task: STS13, split: test. Running...
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 1500 sentences1...
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:962: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 1500 sentences2...
INFO:mteb.evaluation.evaluators.STSEvaluator:Evaluating...
INFO:mteb.evaluation.MTEB:Evaluation for STS13 on test took 12.46 seconds
INFO:mteb.evaluation.MTEB:Scores: {'cos_sim': {'pearson': 0.6425808819903709, 'spearman': 0.655392178430937}, 'manhattan': {'pearson': 0.6577538125092147, 'spearman': 0.6567118768579832}, 'euclidean': {'pearson': 0.6561594766016265, 'spearman': 0.655392178430937}, 'evaluation_time': 12.46}
Running task:  STS14
INFO:mteb.evaluation.MTEB:

## Evaluating 1 tasks:
───────────────────────────────────────── Selected tasks  ──────────────────────────────────────────
STS
    - STS14, s2s


INFO:mteb.evaluation.MTEB:

********************** Evaluating STS14 **********************
INFO:mteb.evaluation.MTEB:Loading dataset for STS14
INFO:mteb.abstasks.AbsTaskSTS:
Task: STS14, split: test. Running...
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 3750 sentences1...
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:962: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 3750 sentences2...
INFO:mteb.evaluation.evaluators.STSEvaluator:Evaluating...
INFO:mteb.evaluation.MTEB:Evaluation for STS14 on test took 31.00 seconds
INFO:mteb.evaluation.MTEB:Scores: {'cos_sim': {'pearson': 0.6901987553139182, 'spearman': 0.6729398845171846}, 'manhattan': {'pearson': 0.6867942298865358, 'spearman': 0.6731367422383607}, 'euclidean': {'pearson': 0.6863547804637292, 'spearman': 0.6729397051395782}, 'evaluation_time': 31.0}
Running task:  STS15
INFO:mteb.evaluation.MTEB:

## Evaluating 1 tasks:
───────────────────────────────────────── Selected tasks  ──────────────────────────────────────────
STS
    - STS15, s2s


INFO:mteb.evaluation.MTEB:

********************** Evaluating STS15 **********************
INFO:mteb.evaluation.MTEB:Loading dataset for STS15
INFO:mteb.abstasks.AbsTaskSTS:
Task: STS15, split: test. Running...
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 3000 sentences1...
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:962: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 3000 sentences2...
INFO:mteb.evaluation.evaluators.STSEvaluator:Evaluating...
INFO:mteb.evaluation.MTEB:Evaluation for STS15 on test took 24.45 seconds
INFO:mteb.evaluation.MTEB:Scores: {'cos_sim': {'pearson': 0.7741248388397876, 'spearman': 0.7920999963495237}, 'manhattan': {'pearson': 0.779554210832401, 'spearman': 0.7913264432791515}, 'euclidean': {'pearson': 0.7804673632290055, 'spearman': 0.7920999963495237}, 'evaluation_time': 24.45}
Running task:  STS16
INFO:mteb.evaluation.MTEB:

## Evaluating 1 tasks:
───────────────────────────────────────── Selected tasks  ──────────────────────────────────────────
STS
    - STS16, s2s


INFO:mteb.evaluation.MTEB:

********************** Evaluating STS16 **********************
INFO:mteb.evaluation.MTEB:Loading dataset for STS16
INFO:mteb.abstasks.AbsTaskSTS:
Task: STS16, split: test. Running...
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 1186 sentences1...
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:962: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 1186 sentences2...
INFO:mteb.evaluation.evaluators.STSEvaluator:Evaluating...
INFO:mteb.evaluation.MTEB:Evaluation for STS16 on test took 9.79 seconds
INFO:mteb.evaluation.MTEB:Scores: {'cos_sim': {'pearson': 0.6909690739150925, 'spearman': 0.6965494553846369}, 'manhattan': {'pearson': 0.6959238122190351, 'spearman': 0.697274648835741}, 'euclidean': {'pearson': 0.6951573797300552, 'spearman': 0.6965494553846369}, 'evaluation_time': 9.79}
Running task:  STSBenchmark
INFO:mteb.evaluation.MTEB:

## Evaluating 1 tasks:
───────────────────────────────────────── Selected tasks  ──────────────────────────────────────────
STS
    - STSBenchmark, s2s


INFO:mteb.evaluation.MTEB:

********************** Evaluating STSBenchmark **********************
INFO:mteb.evaluation.MTEB:Loading dataset for STSBenchmark
INFO:mteb.abstasks.AbsTaskSTS:
Task: STSBenchmark, split: test. Running...
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 1379 sentences1...
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:962: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 1379 sentences2...
INFO:mteb.evaluation.evaluators.STSEvaluator:Evaluating...
INFO:mteb.evaluation.MTEB:Evaluation for STSBenchmark on test took 11.37 seconds
INFO:mteb.evaluation.MTEB:Scores: {'cos_sim': {'pearson': 0.7368840970678752, 'spearman': 0.7371328443627424}, 'manhattan': {'pearson': 0.7446787325062066, 'spearman': 0.7349454241308658}, 'euclidean': {'pearson': 0.7465475020493239, 'spearman': 0.7371330032653948}, 'evaluation_time': 11.37}

gowitheflow-1998 / pixel-linguist Goto Github PK

pixel-linguist's Issues

Reproduction question

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent