Code Monkey home page Code Monkey logo

Comments (5)

jon-tow avatar jon-tow commented on August 30, 2024 11

We're well aware of this (I was one of the core devs of lm-eval - we perform downstream benchmarking the same way 😄). A few things are going on for why we believe this is happening, and hopefully, we can pin them down in our following write-up.
For the time being, you should find that modifying the contexts into dialog prompt format (e.g. Question: -> User: ) should improve scores.

from stablelm.

MarkSchmidty avatar MarkSchmidty commented on August 30, 2024 8

Okay, I made the issue title less alarming since you've chimed in.

Open communication about the issue and what is being done to address it would be appreciated by many. This thread/issue may be a good place to reach more technical users/devs who are keeping tabs.

from stablelm.

lhl avatar lhl commented on August 30, 2024 4

I dropped a line to the lm@stability address mentioned in the announcement to ask about if there was anything I'm doing wrong w/ benchmarks, was curious evals weren't included w/ the model card even as an alpha release (or a note that low benchmark scores were a known issue at least), but will be following w/ interest.

Curious as a foundational model, what's going on w/ dialog prompt formatting? I grepped through tasks and question is used by the QA tasks, so would impact piqa, but how about hellaswag (completions) or winogrande (it's own format)?

from stablelm.

MohamedAliRashad avatar MohamedAliRashad commented on August 30, 2024 2

Any updates on this ?

from stablelm.

mallorbc avatar mallorbc commented on August 30, 2024 2

@jon-tow Using that prompt format for the base model will help? Perhaps you are talking about the tuned model?

from stablelm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.