Code Monkey home page Code Monkey logo

trend-detection's Introduction

Trend-Detection

Detecting Trends in Job Advertisements.

Authors

Khalil Mrini, Kshitij Sharma, Pierre Dillenbourg

Paper available here.

Abstract

We present an automatic method for trend detection in job ads. From a job-posting website, we collect job ads from 16 countries and in 8 languages and 6 job domains. We pre-process them by removing stop words, lemmatising and performing cross-domain filtering. Then, we improve the vocabulary by forming n-grams and restrict it by filtering based on named-entity and part-of-speech tags. We split the job ads to compare two time periods: the first halves of 2016 and 2017. A trending word is defined as a word with a higher TF-IDF weight in 2017 than in 2016. The results obtained show a close correlation between the position of a word in its text and its trendiness regardless of country, language or job domain.

Coding Format

Language: Python 3.

Packages Used: nltk, numpy, matplotlib, scipy, polyglot, pandas, pytrends, bs4, requests, urllib, pattern3, pymystem3.

Python Files Description

The files are described hereafter in the order they should be used:

  1. AdzunaJobAdRetriever.py: Generates json files, one per page, of the job ads of Adzuna in the Raw Data folder
  2. AdzunaJobDescriptionFetcher.py: Fetches the descriptions if available from the original website and outputs them in the Raw Text folder
  3. TrendDetectionPipeline.py: Performs all of the trend detection, with the help of the following files:
    • TreeTagger.py: Implements a tree tagger class for pre-processing
    • SequenceMining.py: Implements the Generalised Sequential Pattern (GSP) Algorithm
  4. TimeSeries.py: Gives counts of the number of job ads collected over time
  5. TrendPositions.py: Computes Trend Positions in the pre-processed text
  6. GoogleTrends.py: Computes the energy of a trending word in Google Trends for comparison

trend-detection's People

Contributors

khalilmrini avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

trend-detection's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.