Code Monkey home page Code Monkey logo

grasp's Introduction

Grasp

Grasp.py โ€“ Explainable AI

Grasp is a lightweight AI toolkit for Python, with tools for data mining, natural language processing (NLP), machine learning (ML) and network analysis. It has 300+ fast and essential algorithms, with ~25 lines of code per function, self-explanatory function names, no dependencies, bundled into one well-documented file: grasp.py (250KB). Or install with pip, including language models (25MB):

$ pip install git+https://github.com/textgain/grasp

Tools for Data Mining

Download stuff with download(url) (or dl), with built-in caching and logging:

src = dl('https://www.textgain.com', cached=True)

Parse HTML with dom(html) into an Element tree and search it with CSS Selectors:

for e in dom(src)('a[href^="http"]'): # external links
    print(e.href)

Strip HTML with plain(Element) to get a plain text string:

for word, count in wc(plain(dom(src))).items():
    print(word, count)

Find articles with wikipedia(str), in HTML:

for e in dom(wikipedia('cat', language='en'))('p'):
    print(plain(e))

Find opinions with twitter.seach(str):

for tweet in first(10, twitter.search('from:textgain')): # latest 10
    print(tweet.id, tweet.text, tweet.date)

Deploy APIs with App. Works with WSGI and Nginx:

app = App()
@app.route('/')
def index(*path, **query):
    return 'Hi! %s %s' % (path, query)
app.run('127.0.0.1', 8080, debug=True)

Once this app is up, go check http://127.0.0.1:8080/app?q=cat.

Tools for Natural Language Processing

Get language with lang(str) for 40+ languages and ~92.5% accuracy:

print(lang('The cat sat on the mat.')) # {'en': 0.99}

Get locations with loc(str) for 25K+ EU cities:

print(loc('The cat lives in Catena.')) # {('Catena', 'IT', 43.8, 11.0): 1}

Get words & sentences with tok(str) (tokenize) at ~125K words/sec:

print(tok("Mr. etc. aren't sentence breaks! ;) This is:.", language='en'))

Get word polarity with pov(str) (point-of-view). Is it a positive or negative opinion?

print(pov(tok('Nice!', language='en'))) # +0.6
print(pov(tok('Dumb.', language='en'))) # -0.4
  • For de, en, es, fr, nl, with ~75% accuracy.
  • You'll need the language models in grasp/lm.

Tag word types with tag(str) in 10+ languages using robust ML models from UD:

for word, pos in tag(tok('The cat sat on the mat.'), language='en'):
    print(word, pos)
  • Parts-of-speech include NOUN, VERB, ADJ, ADV, DET, PRON, PREP, ...
  • For ar, da, de, en, es, fr, it, nl, no, pl, pt, ru, sv, tr, with ~95% accuracy.
  • You'll need the language models in grasp/lm.

Tag keywords with trie, a compiled dict that scans ~250K words/sec:

t = trie({'cat*': 1, 'mat' : 2})
for i, j, k, v in t.search('Cats love catnip.', etc='*'):
    print(i, j, k, v)

Get answers with gpt(). You'll need an OpenAI API key.

print(gpt("Why do cats sit on mats? (you're a psychologist)", key='...'))

Tools for Machine Learning

Machine Learning (ML) algorithms learn by example. If you show them 10K spam and 10K real emails (i.e., train a model), they can predict whether other emails are also spam or not.

Each training example is a {feature: weight} dict with a label. For text, the features could be words, the weights could be word count, and the label might be real or spam.

Quantify text with vec(str) (vectorize) into a {feature: weight} dict:

v1 = vec('I love cats! ๐Ÿ˜€', features=('c3', 'w1'))
v2 = vec('I hate cats! ๐Ÿ˜ก', features=('c3', 'w1'))
  • c1, c2, c3 count consecutive characters. For c2, cats โ†’ 1x ca, 1x at, 1x ts.
  • w1, w2, w3 count consecutive words.

Train models with fit(examples), save as JSON, predict labels:

m = fit([(v1, '+'), (v2, '-')], model=Perceptron) # DecisionTree, KNN, ...
m.save('opinion.json')
m = fit(open('opinion.json'))
print(m.predict(vec('She hates dogs.')) # {'+': 0.4: , '-': 0.6}

Once trained, Model.predict(vector) returns a dict with label probabilities (0.0โ€“1.0).

Tools for Network Analysis

Map networks with Graph, a {node1: {node2: weight}} dict subclass:

g = Graph(directed=True)
g.add('a', 'b') # a โ†’ b
g.add('b', 'c') # b โ†’ c
g.add('b', 'd') # b โ†’ d
g.add('c', 'd') # c โ†’ d
print(g.sp('a', 'd')) # shortest path: a โ†’ b โ†’ d
print(top(pagerank(g))) # strongest node: d, 0.8

See networks with viz(graph):

with open('g.html', 'w') as f:
    f.write(viz(g, src='graph.js'))

You'll need to set src to the grasp/graph.js lib.

Tools for Comfort

Easy date handling with date(v), where v is an int, a str, or another date:

print(date('Mon Jan 31 10:00:00 +0000 2000', format='%Y-%m-%d'))

Easy path handling with cd(...), which always points to the script's folder:

print(cd('kb', 'en-loc.csv')

Easy CSV handling with csv([path]), a list of lists of values:

for code, country, _, _, _, _, _ in csv(cd('kb', 'en-loc.csv')):
    print(code, country)
data = csv()
data.append(('cat', 'Kitty'))
data.append(('cat', 'Simba'))
data.save(cd('cats.csv'))

Tools for Good

A challenge in AI is bias introduced by human trainers. Remember the Model trained earlier? Grasp has tools to explain how & why it makes decisions:

print(explain(vec('She hates dogs.'), m)) # why so negative?

In the returned dict, the model's explanation is: โ€œyou wrote hat + ate (hate)โ€.

grasp's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

grasp's Issues

Installation bug in setup.py?

Hi,
executing !python setup.py install in the directory where I cloned your https://github.com/vicru/grasp returns

warning: install_lib: 'build\lib' does not exist -- no Python modules to install zip_safe flag not set; analyzing archive contents...

Even though the above-mentioned execution also returns:

running install
running bdist_egg
running egg_info
writing Grasp.egg-info\PKG-INFO
writing dependency_links to Grasp.egg-info\dependency_links.txt
writing requirements to Grasp.egg-info\requires.txt
writing top-level names to Grasp.egg-info\top_level.txt
reading manifest file 'Grasp.egg-info\SOURCES.txt'
writing manifest file 'Grasp.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
creating build\bdist.win-amd64\egg
creating build\bdist.win-amd64\egg\EGG-INFO
copying Grasp.egg-info\PKG-INFO -> build\bdist.win-amd64\egg\EGG-INFO
copying Grasp.egg-info\SOURCES.txt -> build\bdist.win-amd64\egg\EGG-INFO
copying Grasp.egg-info\dependency_links.txt -> build\bdist.win-amd64\egg\EGG-INFO
copying Grasp.egg-info\requires.txt -> build\bdist.win-amd64\egg\EGG-INFO
copying Grasp.egg-info\top_level.txt -> build\bdist.win-amd64\egg\EGG-INFO
creating 'dist\Grasp-2.0-py3.8.egg' and adding 'build\bdist.win-amd64\egg' to it
removing 'build\bdist.win-amd64\egg' (and everything under it)
Processing Grasp-2.0-py3.8.egg
Removing c:\users\drcrac\anaconda3\lib\site-packages\Grasp-2.0-py3.8.egg
Copying Grasp-2.0-py3.8.egg to c:\users\drcrac\anaconda3\lib\site-packages
Grasp 2.0 is already the active version in easy-install.pth
Installed c:\users\drcrac\anaconda3\lib\site-packages\grasp-2.0-py3.8.egg
Processing dependencies for Grasp==2.0
Finished processing dependencies for Grasp==2.0

when I execute from grasp import download the following error shows up

ImportError: cannot import name 'download' from 'grasp' (C:\Users\drcrac\anaconda3\lib\site-packages\grasp_init_.py)

Which makes believe that I am doing something wrong with my installation method. I would highly appreciate some feedback in this respect. I couldn't find your library on pip, that is why I tried the above mentioned installation. By the way, my goal is to execute your Proof-of-concept https://gist.github.com/tom-de-smedt/9c9d9b9168ba703e0c336ee0128ebae5

VERB being tagged as NOUN

In running a local test with some boilerplate text, I'm getting a result that isn't tagging the verb properly:

>>> parsed = list(parse("The quick brown fox jumps over the lazy dog."))
>>> parsed[0]
Sentence(u'The/DET quick/ADJ brown/ADJ fox/NOUN jumps/NOUN over/PREP the/DET lazy/ADJ dog/NOUN ./PUNC')

Note that if I change the verb to "jumped" it tags it correctly:

>>> parsed = list(parse("The quick brown fox jumped over the lazy dog."))
>>> parsed[0]
Sentence(u'The/DET quick/ADJ brown/ADJ fox/NOUN jumped/VERB over/PREP the/DET lazy/ADJ dog/NOUN ./PUNC')

Cloned the latest master, running on the following interpreter:

Python 2.7.12 (default, Oct 11 2016, 05:24:00)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)] on darwin

Let me know if you need anything else!

-R

Error fid = open(os fspath(file), "rb")] on python3.5.0

Hello thank you for visitting my issue
I'm studying tensorflow by following "pygta5" on YouTube
I tried to run [balance_data.py](https://github.com/Sentdex/pygta5/tree/master/Tutorial%20Codes/Part%208-13%20code) file on both of "command prompt & IDLE"

Error on command prompt :

Traceback (most recent call last):
File "C:\Windows\System32\pygta5-master\Tutorial Codes\Part 8-13 code\balance_data.py", line 8, in
train_data = np.load('training_data.npy')
File "C:\Users\decax64\AppData\Local\Programs\Python\Python35\lib\site-packages\numpy\lib\npyio.py", line 415, in load
fid = open(os_fspath(file), "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'training_data.npy'

Error on IDLE (only the last 3 lines) :

File "C:\Users~\AppData\Local\Programs\Python\Python35\lib\shutil.py", line 1062, in get_terminal_size
size = os.get_terminal_size(sys.stdout.fileno())
AttributeError: 'NoneType' object has no attribute 'fileno'

I guess directory error
but doesnt work Even I copied training_data.npy file to /lib/ folder

Windows10
Python3.5.0
CPU:Xeon W3530 (similar to core i7 900)= neither AVX nor AVX2

The troublemaker file is : balance_data.py
My current step is :
https://www.youtube.com/watch?v=wIxUp-37jVY&list=PLQVvvaa0QuDeETZEOy4VdocT7TOjfSA8a&index=10

Regards...

URL wrapped in parentheses is not being tokenized properly

Hi there! In some use case testing, I discovered the following behavior:

>>> tokenize("(http://google.com)")
'( http://google.com)'

It looks like the tokens() rule that fires for closing parens occurs after the URL token rule, so the closing paren doesn't get split out as a separate token:

        if w.startswith('('):                                        # (http://
            return '( ' + tokens(w[1:])
        if re.search(r'^https?://', w):                              # http://
            return w
        if re.search(r'[^:;-][),:]$', w):                            # U.S.,
            return tokens(w[:-1]) + ' ' + w[-1]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.