Code Monkey home page Code Monkey logo

rednails's Introduction

RedNails

Description

RedNails is a data scraping library that uses templates to determine what data to extract from actual data feeds.

RedNails uses the template to create a regular expression that catches the user marker variables. When a string of data is passed to RedNails it will use the regular expression to extract the matches and return them to the user.

If the scraped data is regular enough then RedNails is a simple way to extract data as all one needs to do is copy a live data feed and mark the points to extract and make this the template.

License: BSD

Usage

  1. Create a template.
  2. Load and initialize an instance of a RedNails object with the template.
  3. Pass this instance your data feed from which you wish to extract information.
  4. Use the results.

Template Format

A RedNails template is simply a text file that has the points to scrape marked with what looks like a ruby string substitution. You give each substitution a unique variable name that can be referenced when using the parse_hash method.

An example template is:

"Hello my name is #{name}.  How are you?"

If the data string to scrape is:

"Hello my name is Mr.Bill.  How are you?"

Then the following code fragement will produce "Mr.Bill":

 require 'rednails'
 rednails = RedNails.new("template.txt")
 results = rednails.parse_hash("livedata.txt")
 puts results["name"] # => Mr.Bill

Repetitions

If have data that you would like to extract which repeats itself then there is an additional template marker you can use. For the first example replace the data with #{Rep:} after the colon inside of the Rep marker you will then place the structured data that repeats, except that for each unique piece of data that you would like to extract replace it with a unique variable name that starts and ends with @.

For example if you have an arbitrary list of images that you would like to extract you can make a template like this:

<html>
  <body>
    A bunch of photos:
    #{Rep:<img src="@url@" alt="@txt@"/>}
  </body>
</html>

For more details please see the test cases.

Installation

  • Gem: gem install RedNails

  • Manual: ruby setup.rb all

Author and Contributions

  • Zev Blut
  • With some changes and help by Min Lin Hsieh, Daniel DeLorme and Pierre Baumard.

rednails's People

Watchers

Derek Willis avatar James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.