bigscience-workshop / evaluation Goto Github PK
View Code? Open in Web Editor NEWCode and Data for Evaluation WG
License: Other
Code and Data for Evaluation WG
License: Other
Talk to Stas in engineering if need help with this
(per question raised about slide 6 at the evaluation meeting on 9/1).
As the repository gets larger, we will eventually need to decide on a code convention. Though preferences may vary, it's probably safe to stick to the setup in transformers
, which is black
+ isort
.
We could create a simplified Makefile
and do something like
.PHONY: style
style:
black .
isort . --profile=black .
Alternatively, create a .pre-commit-config.yaml
.
repos:
- repo: https://github.com/psf/black
rev: 21.7b0
hooks:
- id: black
- repo: https://github.com/pycqa/isort
rev: 5.9.3
hooks:
- id: isort
args: ["--profile", "black"]
Use to specify the API that all others will follow
This might sounds like a bit of re-structuring but for the sake of future compatibility, I propose the following,
huggingface
trainer: This will help the repo to automatically adapt to deepspeed
and all the exclusive features of transformers library.data_loader
DataCollator
compute_metrics
predictions
(if needed)finetune
our full model, we don't have to change a lot in the surface level.I would love to take some responsibility if needed. Let me know. @jaketae @tianjianjiang @wilsonyhlee
Response generation in Schema-Guided Dialog (including shuffle challenge set)
#56 set up a basic unit test, but we have to consider what kind of tests we want to run. This is especially important given that GitHub workflows does not have any GPU support, and will thus take a non-trivial amount of time to complete even a basic simple benchmark run. The proposal is to ideate some ways in which we could make tests modular and reasonably fast.
use to test generalization to unseen domain; maybe use FLEX?
all 18 languages
E2E NLG (+shuffle challenge set)
use to test generalization to unseen task; maybe use FLEX?
use to test generalization to unseen labels; maybe use FLEX?
use to test generalization to unseen task; maybe use FLEX?
including COVID+bfp02 challenge sets
Russian/English (+shuffle/numbers challenge sets)
use to test generalization to unseen domain; maybe use FLEX?
Spanish/German (including COVID challenge sets)
coordinate with whoever is working on SuperGLUE, we only need to include MNLI once. But NLI will be held-out from model training (whereas the other SuperGLUE tasks will not) so interpreting MNLI results is different from other superglue tasks.
use to test generalization to unseen task; maybe use FLEX?
use to test generalization to unseen language; maybe use FLEX?
with TURK/ASSET test sets (including bfp02+backtranslation challenge sets)
Coordinate with Meg Mitchell about this
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.