Comments (8)
Could you send some reproducible code?
from classy-classification.
nlp.to_disk("/") got value error because of $ symbol in text
from classy-classification.
need to preprocess remove $ in text?
from classy-classification.
@saitej123 could you send me a script.py
that you are running, which results in the error and a pip list
.
from classy-classification.
import spacy
import classy_classification
import pandas as pd
import warnings
warnings.simplefilter('ignore')
import json
f = open('data.json')
data = json.load(f)
import en_core_web_sm
nlp = en_core_web_sm.load()
nlp.add_pipe("text_categorizer" , config={"data":data, "model":"spacy"})
data["invoice"][3]
h = open('test.json')
d1 = json.load(h)
h.close()
print(nlp(d1["invoice"][5])._.cats)
csv_data = df = pd.read_csv('AllTrain-limited.csv')
json_data = csv_data.groupby('DocType')["Text"].apply(list).to_json()
json_data=json.loads(json_data)
with open('d1.json', 'w', encoding='utf-8') as f:
json.dump(json_data, f, ensure_ascii=False, indent=4)
n1 = en_core_web_sm.load()
n1.add_pipe("text_categorizer" , config={"data":json_data, "model":"spacy"})
print(n1(json_data["Correspondence"][5])._.cats)
print(n1(json_data["EOR"][6])._.cats)
print(n1(json_data["Legal"][11])._.cats)
h = open('test_new.json')
test_new = json.load(h)
h.close()
print(n1(test_new["Bills"][4])._.cats)
print(n1(test_new["Medical"][6])._.cats)
[markdown]
### Train with All data
import json
h = open('Full_data.json')
d1 = json.load(h)
h.close()
import en_core_web_sm
nlp = en_core_web_sm.load()
my_component =nlp.add_pipe("text_categorizer" , config={"data":d1, "model":"spacy"})
print(nlp(d1["Bills"][5])._.cats)
h = open('test_new.json')
test_new = json.load(h)
h.close()
print(nlp(test_new["Bills"][4])._.cats)
print(nlp_tf(test_new["Bills"][4])._.cats)
print(nlp(test_new["Medical"][6])._.cats)
print(nlp_tf(test_new["Medical"][6])._.cats)
print(nlp(test_new["Legal"][7])._.cats)
print(nlp_tf(test_new["Legal"][7])._.cats)
print(nlp_hf(test_new["Legal"][7])._.cats)
text_data=create_json(INPUT_DIR)
text_data['Page_1']
nlp_tf = spacy.blank("en")
nlp_tf.add_pipe(
"text_categorizer",
config={
"data": d1,
"model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
} )
import classy_classification
nlp_hf = spacy.blank("en")
nlp_hf.add_pipe(
"text_categorizer",
config={
"data": d1,
"model": "facebook/bart-large-mnli",
"cat_type": "zero"
}
)
nlp.to_bytes()
from classy-classification.
tested in jupyter , used same code provided in readme file nothing fancy !
from classy-classification.
Hi @saitej123 , were you able to save and load the trained model. If yes, can you please provide me reference.
from classy-classification.
@saitej123 @mohitsharma294 see issue #4
from classy-classification.
Related Issues (20)
- ImportError: cannot import name 'cached_path' from 'transformers.file_utils' (/opt/conda/lib/python3.7/site-packages/transformers/file_utils.py) HOT 3
- Saving and loading models
- retrain on saved pickle model? HOT 2
- add zero-shot `onnx`support
- Misaligned pairings of labels and scores? HOT 8
- add `https://onnx.ai/sklearn-onnx/` support HOT 1
- add saving and loading support for `standalone` reproducability HOT 3
- between zero shot and few shot HOT 1
- setfit in classy classification HOT 7
- Drastic performance drop HOT 5
- Running the first example we get a different score HOT 2
- Error when using spacy _trf models HOT 6
- Different language models HOT 9
- Standalone usage without spaCy setting embeddings post adding the data makes the classifications run twice HOT 1
- Token indices sequence length HOT 1
- Spacy embeddings vs sentence transformer embeddings HOT 1
- Example code gives error
- Would be great to also apply the classifier on arbitrary Spans HOT 2
- The current version of package is unstable and exceptions occur HOT 2
- Installations on Ubuntu
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from classy-classification.