Code Monkey home page Code Monkey logo

levenshtein's Introduction

Robert Jacobson, Ph.D.

I am an R&D-oriented computer scientist, mathematician, and software engineer with broad experience. I have particular interests in compilers, programming languages, and virtual machines; computer vision and machine learning; and algorithm design and mathematical programming.

Looking to 𝒉𝒊𝒓𝒆 a mathematician computer scientist? I'm looking to be hired! Get in touch.

The Facts

Resume Jacobson_cv.pdf
LinkedIn Profile www.linkedin.com/in/robertljacobson/
GitHub github.com/rljacobson (Right here!)
Blog www.robertjacobson.dev
Email [email protected]
Phone (401) 996-2940

What I’ve Been Up To

State of the art expression pattern matching using technology from Maude in the Rust library Mod.
Keeping enterprise networks secure with AI at Vectra.
Loris: A term rewriting system and computer algebra system based on sophisticated pattern matching algorithms.
Fighting to eradicate ransomware with artificial intelligence at Halcyon Tech, Inc.
A lot of COVID-19 stuff.

I laser cut a few hundred face shields for local healthcare workers.

I did a little math about sample pooling one afternoon. After seeing it, my marine biologist friend and collaborator convinced me I needed to write an online demo so nonscientists can see the benefits for themselves, then also an article on a mathematical mistake being made by many public health officials which has serious consequences: “Bayes’ Theorem and the Deathly Hallows.” We coauthored a less technical version for nonscientists titled, “COVID-19: Population Testing vs. Thoughts and Prayers?.” Nobel laureate Paul Romer had this to say about it: Romer
Fighting the international trade of illegal wildlife with the Nature Intelligence System I helped create.
A Prolog implementation as part of a tutorial series about writing automated theorem provers and the language⟷mathematics correspondence. Work in progress.
L6 Elsix: An implementation of L6, Bell Laboratory's Low-Level Linked List Language, originally designed and implemented by Kenneth C. Knowlton in 1965 for the IBM 7094 computer at Bell Labs. This language has fallen into obscurity and to my knowledge has no extant implementation. But you have to check out this original state-of-the-art (for the time) demo: The L6 Programming Language, Rendered in Stunning Early Computer Graphics
L6Structure Reflex: A rewrite of RE-flex in rust. RE-flex is a source compatible, modern C++ replacement for the venerable flex lexer generator written by Robert van Engelen. The original flex program by Vern Paxson is itself a rewrite of lex, a unix program written by computing pioneer Mike Lesk and an obscure intern named Eric Schmidt.

Ask me about what I’ve been learning recently!

Last year I started keeping a log of "interesting things" I read, learn, or discover. It's a peek into the storm inside my brain.

Also, things that I work on but not in the last few months:

FoxySheep: A parser for Wolfram Language (Mathematica) ❂ Levenshtein: A blazingly fast Levenstein-Damarau edit distance function for MySQL ❂ Wolfram Language Specification: An independent attempt at describing the entire language

The Philosophical

Passion. My favorite experiences are of solving problems nobody has ever solved before, finding the best known solutions to really hard problems, and learning new areas of math or CS that I didn't know about before—which I try to make a daily habit despite its effect on my technical book hoarding issue.

My experiences Mentoring—and being mentored—have been among the most rewarding of my career. I will always find a place for these experiences in some form or another, whether mentoring junior devs or sitting at the feet of other experts, or something I have not yet imagined.

Experience. My skills slice across traditional boundaries between job categories, in that they usually are not captured by a single narrow job description of typical job listings. My graduate training is in pure mathematics, and I have an undergraduate degree in computer science. As a professor at an undergraduate teaching institution, I studied machine learning and computer vision with undergraduate research students. I simultaneously studied compilers and the theory of programming languages in my own research time.

I have acquired a particular skill set allows easy entry into the problem spaces of society's most pressing challenges. That’s what I intend to do. I moved from academia to industry specifically to make a bigger impact with my skills and interests. Most recently, I have been working on eradicating ransomware and on a variety of projects related to the pandemic and the international wildlife trade. (But ask me about my recreational projects, too.)

My Next Role. My vocation continues to be to take a challenging often novel problem, which is typically within the domain of applications of mathematics and computer science, learn the academic sub-fields relevant to the problem to near exhaustion, and produce a solution that is novel or improves on the state of the art.

I am also interesting in mentorship, both as a mentor and a mentee. It is very important to me that I work in an environment where saying, "I don't know," is the norm rather than an admission of guilt. A mutually supportive and constructive environment produces better science, superior products, and happier employees.

Fun Stats

This is what Sourcerer.io says about my development directory as of mid August 2020. It is not entirely accurate. I am certain I have more C++ code than reported, but I have never used DirectXTK.

SourcererOverview

SourcererLanguages

SourcererFunFacts

Recreational

Project Smurd: My adventures learning electronics and reverse engineering the TARDIS keyboard.

ONE of the following is a lie:

  1. I once got a thank-you letter from the United Nations.
  2. I redesigned the TRS-80 using modern SMD components.
  3. I almost got into a fight with Stephen Wolfram, but Steven Pinker intervened. We all remain good friends to this day.
  4. I went out on a dinner date with Lawrence Lessig's graduate research assistant.
  5. I went out on a dinner date with Lawrence Lessig.
  6. I've worked with a lifelong friend of Prince William, the second in line to the British throne (but I won’t tell you who).

levenshtein's People

Contributors

arhynerwu avatar rljacobson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

grzryc arhynerwu

levenshtein's Issues

Benchmark against alternative implementations

Claims of "blazingly fast" should be backed up with publicly reproducible data. My private benchmarks have Frederik Hertzum's Iosovitch 2X faster than the next fastest implementation, and my addition of some standard mathematical optimizations improved it by another 2X.

A new branch should be created to contain reproducible benchmark code for a variety of common implementations, including the implementation found in python-Levenshtein, and the Levenshtein automata implementation in fst (a rust implementation).

Implement DAMLEVBEST

A typical use case is to query a database for the best match, that is, for the match with minimum edit distance. In such a case, the maximum edit distance k used for early bailout can decrease to the minimum edit distance seen so far as the query is running. If the provided maximum distance is k=7 and the minimum edit distance seen so far is 2, there is no point in computing distances up to k=7 for the next 98,000 rows.

Investigate if SIMD provides any advantage worth doing.

The Wagner-Fischer algorithm uses a matrix that is easily small enough to fit into an AVX2 register for many applications. An algorithm using AVX2 instructions would be interesting. However, it's not clear that we would see significant performance gains over the current optimized serial algorithm. I speculate that, at this point, there is more time spent in overhead than there is spent computing the edit distance.

Function names aren't being exported on Windows

The function names weren't being exported, so I had to add __declspec(dllexport) and __cdecl to each function declaration in the extern "C" {} groups, in each class. Then I compiled under Visual Studio, and everything worked.

Originally posted by @ebaldino in #8 (comment)

I don't have access to this platform to implement this fix, but it should be pretty simple! I welcome pull requests. Also see issue #2.

Can't open shared library 'libdamlev.so'

Having issue on centos 7

"Can't open shared library 'libdamlev.so' (errno: 11, /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /usr/lib64/mysql/plugin/libdamlev.so))"

Support CMake builds on Windows, cleanup windows support

The Windows build process can be described as, "Just do whatever to get the .dll." I couldn't get Visual Studio 2019 EAP to build the library via CMake, so I created a Visual Studio project in which the MySQL include and plugin directories are hardcoded to my own machine. The CMakeLists.txt file includes a half-hearted attempt at facilitating a build on Windows, but it's probably better to remove that code and wait for someone to do it right than to leave it in there.

ERROR 1127 (HY000): Can't find symbol 'DAMLEV' in library

I'm getting this error when trying to install the functions:

ERROR 1127 (HY000): Can't find symbol 'DAMLEV' in library

I'm surprised nobody posted this issue until now. Tried with two different ubuntu server installations all meeting the requirements and the error is still the same.

When I execute through mysql this command: SHOW VARIABLES LIKE '%plugin%' the plugin directory is /usr/lib/mysql/plugin... but when I execute the command mysql_config --plugindir, the directory is this one /usr/lib/x86_64-linux-gnu/mariadb18/plugin. Anyway, I installed the libdamlev.so file in both directories just in case.

What I am doing wrong? MySQL version is 8.0.17

Wrong calculation (not Damerau)

Just spend some hours to get in working, then the firs test failed.
Looks like the logic is not a real Damerau-distance, e.g.

  • ABC vs ABD gives distance 1 => correct
  • Haupt vs Hautp gives distance 2 => should be 1 only since Damerau counts swaped chars as 1 operation, not 2

grafik

Another, correct implementation shows distance correctly:
grafik

Clarify boost dependency

If you are having trouble compiling-- try getting rid of everything under ### Testing and Benchmarking ### in CMakeLists.txt that way you don't have to download the large boost library (boost isn't clearly listed as a requirement)

Originally posted by @sjlevy in #6 (comment)

Figure out what we're using boost for. Either replace it with a lighter alternative, remove the dependency altogether, or, if it's absolutely necessary, document the dependency and its consequences.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.