Code Monkey home page Code Monkey logo

miningresume's Introduction

Mining Resume

This repository contains programs to extract relevant fields from resumes and optionally build a knowledge graph (KG) for effective querying using a chatbot.

Detailed specifications are here and here

Features

Phase 0: Rule-based Extraction

  • Extract important fields like name, email, phone, etc. from resumes based on patterns specified in a config.xml file.

How it works

  • parser.py takes the config file and a directory containing text resumes as arguments.
  • The config file specifies the fields to extract and their respective patterns.
  • The parser logic is independent of the domain (resumes in this case), and changes or additions are made in the config file.
  • Field extraction methods and patterns are defined in the config file.

Usage

  • Prepare your own config.xml file similar to the provided one.
  • For command-line execution, run python parser_by_regex.py.
  • For a GUI, run python main.py.

Phase 1: LLM-based Extraction

  • Uses an open-source model from Hugging Face as a Large Language Model (LLM) and a prompt for extraction.

Usage

  • Run python parser_by_llm.py.

TODO

Phase 2:

  • File to be created: parser_by_spacy.py
  • Use spaCy-based Named Entity Recognition (NER) models for extraction.
  • Build custom NER models if necessary. Data from rijaraju repo is at data\rijaraju_repo_resume_ner_training_data.json
  • Add spaCy Matcher logic if needed.
  • Output is json of key-value pairs, where Key is NER type and value is specific to the resume-person.
  • Also extract relationships values, it Education as key and value as say CoEP, its date range etc.

Phase 3:

  • File to be created: build_kg.py
  • Build a Knowledge Graph (KG) based on the extractions.
  • Nodes can represent entities like Person, Organizations, Skills, etc., and edges can represent relationships like "educated_in," "programs_in," etc.
  • Central person-node can have person specific attributes, but other nodes like Autodesk or CoEP should not have, as other resume-person may also refer them. Resume-person specific attributes should be on edge from Yogesh to CoEP like date range, branch etc.
  • Nodes like Python, NLP will be common and can come from different company nodes, like Icertis, Intuit etc.
  • Schema design is critical as it decides which extractions can be NODES, EDGES and attributes on them.
  • Follow standard schema like schema.org or DBpedia for resume extraction.
  • Represent the KG initially in networkx format and later in Neo4j.
  • Build a Streamlit app to upload resumes and visualize the KG or use Neo4j.

Phase 4:

  • File to be created: resume_chatbot.py
  • Use query languages like SPARQL or Cypher, depending on the KG's residence.
  • Leverage LLMs to convert natural language English queries into SPARQL or Cypher.
  • Build a Streamlit chatbot for querying the KG. See if you can visualize the built KG.
  • Deploy the chatbot on Streamlit-Shares for limited (e.g., 5 resumes) public access.

Phase 5: Production

  • Build an end-to-end system with payment integration as a pay-per-use MicroSaaS.
  • Consider deploying on cloud platforms like VertexAI or Azure.

Disclaimer

  • The author ([email protected]) provides no guarantee for the program's results. It is a fun script with room for improvement. Do not depend on it entirely.

Copyright (C) 2017 Yogesh H Kulkarni

miningresume's People

Contributors

dependabot[bot] avatar sunny-shankar avatar swarajendait avatar yogeshhk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

miningresume's Issues

Building Web UI

Put flask UI on current python script
Add folder selection button, show results in tabular format

To Python 3.6+

Check if the script works in Python 3.6+ and if not make necessary changes

Build KG

Based on the specifications, assume extractions have been done, then write code to build networkx graph. Visualize it.
Can it be visualized in Streamlit?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.