Code Monkey home page Code Monkey logo

goodreads's Introduction

Goodreads Transformation

The purpose of this notebook is both to make the CSV one exports from Goodreads more amenable to analysis. I've only addressed the two most obvious issues so far, and invite anyone interested to help me make this even better!

  1. ISBNs written as formulae

My hunch is that this is a workaround Goodreads implemented because ISBNs are typically composed exclusively of integers and will be interpreted as numbers by software like Excel, which changes the representation of that number for the user (by doing things like dropping leading 0s). I think because it's not possible with a CSV to specify the data type of the column (ideally as a string here, so the exact number is reproduced), Goodreads made the cell value itself a formula describing the ISBN as a string. This is fine for Excel, but not for Python, so I wrote what is probably inefficient code for dropping the punctuation that makes that possible.

  1. Bookshelf column composed of greater than one value

Let me know what might be a more elegant solution, but I made a second table to pull the bookshelf tags from the first so each tag might have its own row, with ISBN13 as its key. A couple of values in my dataset may be null because an ISBN13 value was not included, which I think is just a data problem on Goodreads' side, but perhaps you won't have that issue.

  1. Formatted full dates to Pandas datetime objects

Not sure if this is everyone's favorite format, but this would be helpful for studying patterns over time. Changed the format on Date Added, Date Read, and Original Purchase Date.

goodreads's People

Contributors

meli-lewis avatar emmmbrown avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.