Code Monkey home page Code Monkey logo

Comments (8)

davidberenstein1957 avatar davidberenstein1957 commented on May 28, 2024

Could you send some reproducible code?

from classy-classification.

saitej123 avatar saitej123 commented on May 28, 2024

nlp.to_disk("/") got value error because of $ symbol in text

from classy-classification.

saitej123 avatar saitej123 commented on May 28, 2024

need to preprocess remove $ in text?

from classy-classification.

davidberenstein1957 avatar davidberenstein1957 commented on May 28, 2024

@saitej123 could you send me a script.py that you are running, which results in the error and a pip list.

from classy-classification.

saitej123 avatar saitej123 commented on May 28, 2024

import spacy
import classy_classification
import pandas as pd
import warnings
warnings.simplefilter('ignore')

import json
f = open('data.json')
data = json.load(f)

import en_core_web_sm
nlp = en_core_web_sm.load()
nlp.add_pipe("text_categorizer" , config={"data":data, "model":"spacy"})

data["invoice"][3]

h = open('test.json')
d1 = json.load(h)
h.close()

print(nlp(d1["invoice"][5])._.cats)

csv_data = df = pd.read_csv('AllTrain-limited.csv')

json_data = csv_data.groupby('DocType')["Text"].apply(list).to_json()
json_data=json.loads(json_data)

with open('d1.json', 'w', encoding='utf-8') as f:
json.dump(json_data, f, ensure_ascii=False, indent=4)

n1 = en_core_web_sm.load()

n1.add_pipe("text_categorizer" , config={"data":json_data, "model":"spacy"})

print(n1(json_data["Correspondence"][5])._.cats)

print(n1(json_data["EOR"][6])._.cats)

print(n1(json_data["Legal"][11])._.cats)

h = open('test_new.json')
test_new = json.load(h)
h.close()

print(n1(test_new["Bills"][4])._.cats)

print(n1(test_new["Medical"][6])._.cats)

[markdown]

### Train with All data

import json
h = open('Full_data.json')
d1 = json.load(h)
h.close()

import en_core_web_sm
nlp = en_core_web_sm.load()

my_component =nlp.add_pipe("text_categorizer" , config={"data":d1, "model":"spacy"})

print(nlp(d1["Bills"][5])._.cats)

h = open('test_new.json')
test_new = json.load(h)
h.close()

print(nlp(test_new["Bills"][4])._.cats)

print(nlp_tf(test_new["Bills"][4])._.cats)

print(nlp(test_new["Medical"][6])._.cats)

print(nlp_tf(test_new["Medical"][6])._.cats)

print(nlp(test_new["Legal"][7])._.cats)

print(nlp_tf(test_new["Legal"][7])._.cats)

print(nlp_hf(test_new["Legal"][7])._.cats)

text_data=create_json(INPUT_DIR)

text_data['Page_1']

nlp_tf = spacy.blank("en")
nlp_tf.add_pipe(
"text_categorizer",
config={
"data": d1,
"model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
} )

import classy_classification
nlp_hf = spacy.blank("en")
nlp_hf.add_pipe(
"text_categorizer",
config={
"data": d1,
"model": "facebook/bart-large-mnli",
"cat_type": "zero"
}
)

nlp.to_bytes()

from classy-classification.

saitej123 avatar saitej123 commented on May 28, 2024

tested in jupyter , used same code provided in readme file nothing fancy !

from classy-classification.

mohitsharma294 avatar mohitsharma294 commented on May 28, 2024

Hi @saitej123 , were you able to save and load the trained model. If yes, can you please provide me reference.

from classy-classification.

davidberenstein1957 avatar davidberenstein1957 commented on May 28, 2024

@saitej123 @mohitsharma294 see issue #4

from classy-classification.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.