Code Monkey home page Code Monkey logo

pixel-linguist's Issues

Reproduction question

The STS-B is about 0.7371, I made it work on both mteb lib and your eval_sts py.

Building models for Pixel-Linguist/Pixel-Linguist-v0
model type: pixel
Some weights of the model checkpoint at Pixel-Linguist/Pixel-Linguist-v0 were not used when initializing PIXELForRepresentation: ['pooler.linear.bias', 'pooler.ln.weight', 'pooler.linear.weight', 'pooler.ln.bias']
- This IS expected if you are initializing PIXELForRepresentation from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing PIXELForRepresentation from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
100% 87/87 [00:12<00:00,  7.08it/s]
spearman all languages: [0.7371328443627424]
anisotropy all languages: [0.207]

It seems about 4 point lower that the digit in your paper.
image

INFO:pixel.data.rendering.rendering_utils:loading text renderer configuration file ./Pixel-Linguist-v0/text_renderer_config.json from cache at /content/Pixel-Linguist/Pixel-Linguist-v0/text_renderer_config.json
./Pixel-Linguist-v0/renderer.renderer
INFO:pixel.data.rendering.rendering_utils:loading font file ./Pixel-Linguist-v0/renderer.renderer from cache at /content/Pixel-Linguist/data/fallback_fonts/GoNotoCurrent.ttf
INFO:pixel.data.rendering.pangocairo_renderer:Loading font from /content/Pixel-Linguist/data/fallback_fonts/GoNotoCurrent.ttf

Running task:  SICK-R
INFO:mteb.evaluation.MTEB:

## Evaluating 1 tasks:
───────────────────────────────────────── Selected tasks  ──────────────────────────────────────────
STS
    - SICK-R, s2s


INFO:mteb.evaluation.MTEB:

********************** Evaluating SICK-R **********************
INFO:mteb.evaluation.MTEB:Loading dataset for SICK-R
INFO:mteb.abstasks.AbsTaskSTS:
Task: SICK-R, split: test. Running...
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 9927 sentences1...
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:962: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 9927 sentences2...
INFO:mteb.evaluation.evaluators.STSEvaluator:Evaluating...
INFO:mteb.evaluation.MTEB:Evaluation for SICK-R on test took 82.60 seconds
INFO:mteb.evaluation.MTEB:Scores: {'cos_sim': {'pearson': 0.7781435725561205, 'spearman': 0.693026233861837}, 'manhattan': {'pearson': 0.7458007547441067, 'spearman': 0.6917351042169897}, 'euclidean': {'pearson': 0.74647644146939, 'spearman': 0.6930262498681216}, 'evaluation_time': 82.6}
Running task:  STS12
INFO:mteb.evaluation.MTEB:

## Evaluating 1 tasks:
───────────────────────────────────────── Selected tasks  ──────────────────────────────────────────
STS
    - STS12, s2s


INFO:mteb.evaluation.MTEB:

********************** Evaluating STS12 **********************
INFO:mteb.evaluation.MTEB:Loading dataset for STS12
INFO:mteb.abstasks.AbsTaskSTS:
Task: STS12, split: test. Running...
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 3108 sentences1...
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:962: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 3108 sentences2...
INFO:mteb.evaluation.evaluators.STSEvaluator:Evaluating...
INFO:mteb.evaluation.MTEB:Evaluation for STS12 on test took 26.10 seconds
INFO:mteb.evaluation.MTEB:Scores: {'cos_sim': {'pearson': 0.8148581693135399, 'spearman': 0.7350674849787033}, 'manhattan': {'pearson': 0.7817916408847375, 'spearman': 0.7352018056406955}, 'euclidean': {'pearson': 0.7811356917848975, 'spearman': 0.7350674571026258}, 'evaluation_time': 26.1}
Running task:  STS13
INFO:mteb.evaluation.MTEB:

## Evaluating 1 tasks:
───────────────────────────────────────── Selected tasks  ──────────────────────────────────────────
STS
    - STS13, s2s


INFO:mteb.evaluation.MTEB:

********************** Evaluating STS13 **********************
INFO:mteb.evaluation.MTEB:Loading dataset for STS13
INFO:mteb.abstasks.AbsTaskSTS:
Task: STS13, split: test. Running...
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 1500 sentences1...
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:962: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 1500 sentences2...
INFO:mteb.evaluation.evaluators.STSEvaluator:Evaluating...
INFO:mteb.evaluation.MTEB:Evaluation for STS13 on test took 12.46 seconds
INFO:mteb.evaluation.MTEB:Scores: {'cos_sim': {'pearson': 0.6425808819903709, 'spearman': 0.655392178430937}, 'manhattan': {'pearson': 0.6577538125092147, 'spearman': 0.6567118768579832}, 'euclidean': {'pearson': 0.6561594766016265, 'spearman': 0.655392178430937}, 'evaluation_time': 12.46}
Running task:  STS14
INFO:mteb.evaluation.MTEB:

## Evaluating 1 tasks:
───────────────────────────────────────── Selected tasks  ──────────────────────────────────────────
STS
    - STS14, s2s


INFO:mteb.evaluation.MTEB:

********************** Evaluating STS14 **********************
INFO:mteb.evaluation.MTEB:Loading dataset for STS14
INFO:mteb.abstasks.AbsTaskSTS:
Task: STS14, split: test. Running...
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 3750 sentences1...
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:962: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 3750 sentences2...
INFO:mteb.evaluation.evaluators.STSEvaluator:Evaluating...
INFO:mteb.evaluation.MTEB:Evaluation for STS14 on test took 31.00 seconds
INFO:mteb.evaluation.MTEB:Scores: {'cos_sim': {'pearson': 0.6901987553139182, 'spearman': 0.6729398845171846}, 'manhattan': {'pearson': 0.6867942298865358, 'spearman': 0.6731367422383607}, 'euclidean': {'pearson': 0.6863547804637292, 'spearman': 0.6729397051395782}, 'evaluation_time': 31.0}
Running task:  STS15
INFO:mteb.evaluation.MTEB:

## Evaluating 1 tasks:
───────────────────────────────────────── Selected tasks  ──────────────────────────────────────────
STS
    - STS15, s2s


INFO:mteb.evaluation.MTEB:

********************** Evaluating STS15 **********************
INFO:mteb.evaluation.MTEB:Loading dataset for STS15
INFO:mteb.abstasks.AbsTaskSTS:
Task: STS15, split: test. Running...
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 3000 sentences1...
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:962: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 3000 sentences2...
INFO:mteb.evaluation.evaluators.STSEvaluator:Evaluating...
INFO:mteb.evaluation.MTEB:Evaluation for STS15 on test took 24.45 seconds
INFO:mteb.evaluation.MTEB:Scores: {'cos_sim': {'pearson': 0.7741248388397876, 'spearman': 0.7920999963495237}, 'manhattan': {'pearson': 0.779554210832401, 'spearman': 0.7913264432791515}, 'euclidean': {'pearson': 0.7804673632290055, 'spearman': 0.7920999963495237}, 'evaluation_time': 24.45}
Running task:  STS16
INFO:mteb.evaluation.MTEB:

## Evaluating 1 tasks:
───────────────────────────────────────── Selected tasks  ──────────────────────────────────────────
STS
    - STS16, s2s


INFO:mteb.evaluation.MTEB:

********************** Evaluating STS16 **********************
INFO:mteb.evaluation.MTEB:Loading dataset for STS16
INFO:mteb.abstasks.AbsTaskSTS:
Task: STS16, split: test. Running...
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 1186 sentences1...
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:962: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 1186 sentences2...
INFO:mteb.evaluation.evaluators.STSEvaluator:Evaluating...
INFO:mteb.evaluation.MTEB:Evaluation for STS16 on test took 9.79 seconds
INFO:mteb.evaluation.MTEB:Scores: {'cos_sim': {'pearson': 0.6909690739150925, 'spearman': 0.6965494553846369}, 'manhattan': {'pearson': 0.6959238122190351, 'spearman': 0.697274648835741}, 'euclidean': {'pearson': 0.6951573797300552, 'spearman': 0.6965494553846369}, 'evaluation_time': 9.79}
Running task:  STSBenchmark
INFO:mteb.evaluation.MTEB:

## Evaluating 1 tasks:
───────────────────────────────────────── Selected tasks  ──────────────────────────────────────────
STS
    - STSBenchmark, s2s


INFO:mteb.evaluation.MTEB:

********************** Evaluating STSBenchmark **********************
INFO:mteb.evaluation.MTEB:Loading dataset for STSBenchmark
INFO:mteb.abstasks.AbsTaskSTS:
Task: STSBenchmark, split: test. Running...
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 1379 sentences1...
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:962: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 1379 sentences2...
INFO:mteb.evaluation.evaluators.STSEvaluator:Evaluating...
INFO:mteb.evaluation.MTEB:Evaluation for STSBenchmark on test took 11.37 seconds
INFO:mteb.evaluation.MTEB:Scores: {'cos_sim': {'pearson': 0.7368840970678752, 'spearman': 0.7371328443627424}, 'manhattan': {'pearson': 0.7446787325062066, 'spearman': 0.7349454241308658}, 'euclidean': {'pearson': 0.7465475020493239, 'spearman': 0.7371330032653948}, 'evaluation_time': 11.37}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.