Comments (3)
Thanks so much, really appreciate the detailed response. I have been playing with FAISS a lot, inspired by your great package.
from linktransformer.
Thanks, Peter! It appears that we never got notified of this issue via github notifications.
- We use wikidata (https://www.wikidata.org/wiki/Q487907) to train these models. We are currently working on an updated draft of the paper and will make the data available in a couple of weeks if not before.
- You can absolutely train a larger model on these! We already have code to do so in the repo (under benchmarking/), once the data is made available, perhaps you can train it with any backbone. We chose smaller models only to demonstrate trainability on free colab instances
- While we haven't formally tested it, but there are references in the literature talking about problems BERT-based models face with short text, like names. You can absolutely use Open AI embeddings and see what you get! We have examples using OAI embeddings in the notebook. Re: the prefix, It is definitely worth trying out. I'd probably make it "The name of the company is XXXX" instead of the colon format. If the motivation here is to add more language, it would be better achieved with a fuller sentence. Open AI has released two new embedding models as well, we will update their performance on our test split once we update the paper. We are aiming for a newer version of everything by late feb-early March.
Sorry for the delay! Feel free to ask anything else.
from linktransformer.
Hi @lamberpj , a new and vastly improved company model is now up. In fact, all of our models have been updated. We are targeting the completion of a new version of the paper + release of the data within the next week or so.
Closing this for now.
from linktransformer.
Related Issues (10)
- Typo in GitHub pages model zoo HOT 1
- Fixing `dedup_rows` example HOT 1
- KeyError: 'metric' when running cluster_rows HOT 7
- Parameter "suffixes" does not work for lt.merge() HOT 2
- Different merge types results in the same logic HOT 4
- Suggestion to implement range_search HOT 1
- Inconsistent merge type comment HOT 1
- AttributeError: 'BinaryClassificationEvaluator_wandb' object has no attribute 'truncate_dim' HOT 3
- Option to use precomputed corpus (df2) for matching single new rows? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from linktransformer.