Code Monkey home page Code Monkey logo

indeed-job-listings's Introduction

print("Hello, world!") ๐Ÿ‘‹๐ŸŒ

GIF

I'm Anouk!

  • ๐ŸŽ“ Student MSc Marketing Analytics and MSc Data Science & Society at Tilburg University
  • ๐Ÿซ Current courses:
    • Interactive Data Transformation
    • Social Media and Web Analytics
    • Experimental Research
  • ๐ŸŒฑ Currently teaching myself:
    • Tableau
    • Building Shiny apps
    • Norwegian ๐Ÿ‡ณ๐Ÿ‡ด


indeed-job-listings's People

Contributors

alantjee avatar anouk2311 avatar georgianahutanu avatar reneen1998 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

indeed-job-listings's Issues

Error marketeer data

Hi! I was checking for that error in line 5 of the marketeer data, and I think there is something wrong with the separator. Not sure why, because we did the same with this data set as we did with the others.

Schermafbeelding 2021-03-20 om 09 20 48

Schermafbeelding 2021-03-20 om 09 20 54

Data cleaning dirty location goes wrong

Hi! I was checking the data_clean.R code and our remove_dirty_location also removes 'Noord' from Noord-Holland and Noordwijk so they end up as -holland and wijk. So maybe we should not delete the words first, but directly replace the whole string?

E.g. so replace Amsterdam-noord by Amsterdam, instead of removing noord everywhere.

clean data functions

Hi! I tried to change the clean_data.R file into a file with functions for cleaning the data. However, apparently there are some problems with using dplyr within a function in R. Can one of you take a look at this as well? I don't see how to fix it. Right now, it doesn't create a new column named location_trimmed.

I also included a prototype function to run the cleaning functions on all datasets. But we should first fix that dplyr problem, before testing if this works:
Schermafbeelding 2021-03-23 om 12 21 30

Switching of the review/location

Hey guys, at the #getjoblocation part and #getcompanyreview part in the scraper the locations and the reviews get mixed up with eachother. I think i found the problem, but i don't know how to fix it. Could one of you have a look at it? See the pictures. The text.splitlines()[1] gives the reviews in this case and text.splitlines()[2] gives the location.

This is why these 2 get mixed up, because not every vacancy has a review component.

image
image

Salary cleaning in cleaning file

Hi Guys, most of the code with functions seem to work so far, only in the cleaning file where I added the salary cleaning step does it go wrong at the salary cleaning function(4th one). If you could have a look as well would be great. Trying to fix it right now but not seeming to get any closer.

How to name ambiguous location strings

We know give the name Unknown to all locations that are not in a specific city. Should we maybe change this to Remote or Online or something similar?

Format changes analysis

Hey guys,

I made some changes in the formatting of the Rmarkdown so that the pdf becomes more readable (Like blank lines between header and text, new chapters on new pages and plots to stop floating to the right). Please take a look at it and let me know if you still see some things that need changing. Thanks!

last parts frequency and location

The keyword analysis and location frequency largely are functionised right now. If you could try and see if it runs on your own computer as well would be nice. Also some of the last parts I did not really make a function so could still be even more efficient. But already reduced half of the code so we are on the right way.

download_data issue

Hi guys, I just tested download_data.R and it seems that there is still an issue for the marketing-analist data, I only get the listings and the first 3 descriptions. Can you have a look at it?

Combined plot salary analysis

Our combined plot for the salary analysis for top locations salary wise only shows 3 cities because they are the only 3 cities that show up in all 4 plots. If we relax the filter of 3 job postings per location we will get a plot with more cities in it but some of these cities will have only one job ad per search term and thus an average is not very useful in this case.

What do you guys want, keep the plots like this with only a few cities to be compared, or remove the minimum number of job ads needed per location to get plots with more cities in the plot but single job ads having higher influence on average salaries.

Last things Readme

  • Update the repository overview <-- I'll pick that up
  • Integrate analysis salary part in the results overview <-- I'll pick it up when salary analyses is fixed
  • Should we give a description on how to run the makefile? Or is that 'common sense'
  • Overall read through and last checks

Almost done! Great job everyone :)

Salary cleaning needs to be performed in a subsequent file.

Hi I am currently updating the analysis scripts to incorporate for all 4 datasets and the keyword analysis for each job search term. However the datasets I used should not be cleaned from salary data because this reduces the number of observations massively. I think it is a better idea to seperate the cleaning steps of location strings and removing duplicates into one file. ANd the salary cleaning in the salary analysis file.

Download data improved file

Hi! Could you please check if the new download_data.R file works for you? I included functions and it now downloads the data directly from Google Drive instead of Github.

If it works, we can delete all datasets from Github so they are not public anymore.

Documentation for ODCM is completely done, Readme needs additions

Hey guys,

The documentation (datasheet) for ODCM is imo completely done, please do have a look before the deadline to make any enhancements or changes.

Readme file has already been filled in for the most, still needs the part where we explain how to run exactly. I will start working on that. Maybe a good idea to shorten the Method & Results part a little bit? Makes it easier to read.

Create driver object in selenium scraper gives error

The code we now have is "driver = webdriver.Chrome()". I don't know how it runs on your computers, but I have to put in my path between the brackets, e.g. "driver = webdriver.Chrome('/usr/local/bin/chromedriver')".

Code has to run on all computers without adjustments right?

Error in makefile

Not sure how to add the analysis and analysis/output folders in gen. Tried it with directory.R now (see workflow), but still gives an error.

110803541-95c36480-827f-11eb-8efd-9c89e3f7cdb0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.