The STS-B is about 0.7371, I made it work on both mteb lib and your eval_sts py. <

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Reproduction question about pixel-linguist HOT 9 CLOSED

WuNein commented on June 23, 2024

Reproduction question

from pixel-linguist.

Comments (9)

WuNein commented on June 23, 2024 5

@gowitheflow-1998
Reproduced at

cairo (1, 26, 0)
manimpango 0.5.0
gi 3.46.0

The lib render font and input pixels may be important.

From my old cairo lib from apt.

From your new one

It is different, see 'o'. (One with something like cleartype and one without.) A slight difference change the score.
I may check it later, may not the final solution.

Thank you, for your solution!

from pixel-linguist.

gowitheflow-1998 commented on June 23, 2024 1

@WuNein
Glad that it worked out! Thanks for providing the insights. We will aim to conduct a complete behavior test of the renderers for next iteration. And eventually will aim to train a model agnostic to fonts. Thanks again for your inputs at this stage.

from pixel-linguist.

gowitheflow-1998 commented on June 23, 2024

@WuNein

thank you for trying out our model! We had similar issue before as well. The most likely cause is because of the font (the model is not super robust across different fonts yet). If you can git clone our repo for the fonts, it should work fine. We can perfectly reproduce the same results on different machines now. If the problem remains, feel free to drop an email!

from pixel-linguist.

gowitheflow-1998 commented on June 23, 2024

adding on my reply above, also note that when the fonts are correct, the English sts-b result of Pixel-Linguist-v0 (which is our multi-lingual model, see last row of Table 5 in the paper) should be 78.79. Looking forward to your reproduction!

from pixel-linguist.

WuNein commented on June 23, 2024

@gowitheflow-1998

My result on colab is the same 73. I also suspect the problem is related to font. You may do something to give us the font you are using, like print(processor).

PangoCairoTextRenderer {
  "background_color": "white",
  "dpi": 120,
  "font_color": "black",
  "font_file": "GoNotoCurrent.ttf",

I suggest you export the font file.

Direct assign font path at

Pixel-Linguist/src/pixel/data/rendering/pangocairo_renderer.py

Line 79 in 2d48a48

self.font_file = font_file

may help examine.

However, the text_renderer_config.json defined "font_file": "renderer.renderer",.
The font should baked in the renderer.renderer, I cannot reproduce without modify the code.
Your config is different from the original ones https://huggingface.co/Team-PIXEL/pixel-base/blob/main/text_renderer_config.json

I test the following font, none of those can reproduce:

renderer.renderer/Do nothing ~73.71
NotoSans-Regular.ttf ~73
GoNotoCurrent.ttf ~73
Ubuntu-Regular.ttf poor
DejaVuSans.ttf even worst
DejaVuSansMono.ttf even worst

from pixel-linguist.

gowitheflow-1998 commented on June 23, 2024

@WuNein

Thank you for bringing the issue to our attention. We haven't figured out the exact cause of it, but provide with you two temporary solutions and our hypothesis of the potential cause:

As you mentioned Colab, I have implemented a colab reproduction code for you. Check out: https://colab.research.google.com/drive/19NeGSnuO4a4N8KoTz89LD4tsQZf3d_I4#scrollTo=zwz-XnLHVTEb
On your local machine, please create a new conda environment, instead of using the old environment you used to run the repo of the vanilla PIXEL model. We have tried on multiple servers that it works.

Potential cause:
There might be certain font conflits caused by running the repo of the vanilla PIXEL model. When we run our model with an environment we used to test out the vanilla PIXEL model previously, it provides similar performance degradation as you got. We haven't thought of an elegant way to solve it, but this is on our list now, meanwhile we welcome your contributions. Currently, we recommend using the above solutions to reproduce the reuslts and conduct further work. The key is a new CLEAN environment. Thanks again for bringing up the issue.

from pixel-linguist.

gowitheflow-1998 commented on June 23, 2024

@WuNein Please let me know if the above solutions work. Would love to help further.

from pixel-linguist.

WuNein commented on June 23, 2024

@gowitheflow-1998
A model agnostic to fonts may be difficult for pixel (MAE) models, more aux approach may needed.
I've been doing some research on sentence embedding with data argumentation for some time, wishing to share my LREC-COLING 2024 paper soon.

from pixel-linguist.

gowitheflow-1998 commented on June 23, 2024

@WuNein
Yes, I guess a fonts-agnostic model will require a lot of trivial work on augmentation. Glad that we're sharing the same passion for sentence representation. Looking forward to seeing your COLING work. Feel free to contact me personally anytime.

from pixel-linguist.

Reproduction question about pixel-linguist HOT 9 CLOSED

Comments (9)

Related Issues (1)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent