Code Monkey home page Code Monkey logo

german-rap-analysis's Introduction

Behind the Beats: Curation of a Dataset for Exploring German Rap Language

Abstract

Rap music is one of the most influential contemporary music genres with a huge societal significance. Existing analyses of rap often rely on surveys or web scraping approaches that do not take popularity of songs into account. In this work, we contribute a novel dataset for German rap lyrics analysis, comprising 13,997 songs that achieved high chart positions between the years 2015-2023. Each song is further annotated with moderation labels, reflecting the presence of harassment, violence, sexuality, hate and self-harm, enabling analysis of harmful or explicit content within German rap. The proposed dataset opens up the possibility to explore individual artists' lyrical styles as well as the temporal development of German rap.

Setup

Create python venv with python 3.10 and install requirements:

python3.10 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Copy the file .env.base to .env and fill in the secrets.

GENIUS_TOKEN='REPLACE_ME'
OPENAI_API_KEY='REPLACE_ME'

Running the scraping pipeline

The main script for scraping the data can be found in src/main.py.

Run the script with:

python src/main.py

Accessing the data

The scraped data is stored in the data folder:

  • data/csv/charts.csv: Dataset with the scraped charts data.
  • data/csv/songs.csv: Dataset with the scraped songs data (no lyrics).
  • data/csv/moderation.csv: Dataset with the content moderation predictions.
  • data/csv/output.csv: Joined dataset of the above datasets.
  • data/lyrics.zip: Zip file with the scraped lyrics.

Experiments

The experiments are stored in the experiments folder:

  • experiments/exp_PD_001_AlgorithmicAnalysis.ipynb: Notebook with the algorithmic analysis of the moderation labels.
  • experiments/exp_PD_002_DeToxAnalysis.ipynb: Notebook with the analysis of the DeTox hate speech labels.

Results of the experiments are stored in the data/experiments folder.

Docs

The scripts for creating visualizations for the paper can be found in doc/data_lit_23-24_paper/fig.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.