Code Monkey home page Code Monkey logo

quotes-500k's Introduction

Quotes-500K

Large Dataset on Quotes

Due to the unavailability of publicly available large dataset on Quotes, we prepared a dataset of our own for solving the task of proposing Contextually Relevant Quotes for images rather than just generating ordinary captions.

For creating this dataset, we used the Python package - BeautifulSoup, to crawl quotes from various popular websites such as https://www.goodreads.com/quotes, https://www.brainyquote.com/ , http://www.famousquotesandauthors.com/ and http://www.curatedquotes.com/. All quotes, except the ones in English, were removed using the Python package - langdetect.

The final dataset is offered in csv file format and contains three columns --- the quote, the author of the quote and the category tags for that quote. Examples of tags include --- love, life, philosophy, motivation, family etc. These tags help in describing the various categories that a particular quote belongs to. The total number of quotes in our final dataset after crawling and further cleaning of the dataset was approximately equal to five hundred thousand (500K) quotes.

We have made this dataset publicly available, so that it can be used by fellow researchers for educational and research purposes.

Link to Download Dataset: https://goo.gl/R3Sa34

Please cite our paper if you wish to use this dataset for your research work.

Title: Proposing Contextually Relevant Quotes for Images

Authors: Shivali Goel, Rishi Madhok, Shweta Garg

In proceedings of: 40th European Conference on Information Retreival

Year: 2018

quotes-500k's People

Contributors

shivaligoel avatar rishimadhok avatar

Stargazers

 avatar  avatar mpr0xy avatar Divyam Singal avatar  avatar Sergey Feldman avatar Pierce Brooks avatar Somin Yoon avatar Cenax avatar  avatar  avatar Arun kumar avatar Ivan avatar Diogo Ribeiro avatar Fila avatar Jonas Stettner avatar Andrés Ignacio Torres avatar zchen3000 avatar enoobis avatar Nik Zaitsev avatar Mustafa Akbas avatar Reza Babakhani avatar Nathan Hawkins avatar  avatar  avatar  avatar Aleksander Izemski avatar Cave avatar Stephan Huez avatar ali avatar Aleti Adarsh avatar  avatar aidyosu avatar Jeffrey Smith avatar Jordan Levy avatar Aashish Kumar Sharma avatar Rohak avatar Param Siddharth avatar Anton Kristensen avatar Ryan Nutt avatar  avatar Ahamed Afri avatar svrem avatar Mulaza Jacinto avatar Dmitry B.V. avatar Thidesh Limbu avatar Lionel Arucy avatar R. Martinez avatar Ali A. Hilal avatar Seongmin Park avatar Sean Beck avatar Ayaan Panda avatar Maarten avatar Davoud Arsalani avatar Casey Verde avatar Thilak CM avatar  avatar BarbierB ▼ avatar GOXR3PLUS STUDIO avatar Chung Kai Hsieh avatar Eyad Ahmed avatar  avatar ramayac avatar Alexander Seifert avatar Andrew X. Ding avatar Federico Peralta avatar K.G. Miller avatar paigeshin avatar  avatar Ali H. K. avatar Brad avatar HeisenBerg? avatar Çağatay Çallı avatar  avatar soappanda avatar  avatar Mohd Tausif avatar Haobo Yang avatar Adithya Badidey avatar  avatar Georg Bär-Dumont avatar Hudson Leonardo MENDES avatar Chiranjeet Baruah avatar  avatar Pratik Upacharya avatar Marc Owen avatar Shivam Pradhan avatar Jacob Celestine avatar  avatar Akhil Shukla avatar Romaric avatar TheManFromGlobe avatar Olga Pudrovska avatar Nick Cannariato avatar Shekhar Narayan Pande avatar  avatar Steven avatar Peter Zam avatar Arturs Smirnovs avatar Maksym Petyak avatar

Watchers

Ivan Tanasijevic avatar Shweta Garg avatar  avatar Umer Bilal  avatar

quotes-500k's Issues

Ill formatted quotes

There are 7023 lines in the CSV file that are not formatted properly. Most cases it is the quote that contains "," but not enclosed in " character. JFYI.

Download link is not working

Thanks for collection all these. I can not download the dataset from https://goo.gl/R3Sa34.
Would you please have a look?

Sorry, the file you have requested does not exist.

Make sure that you have the correct URL and that the file exists.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.