Code Monkey home page Code Monkey logo

douban_fdw's Introduction

douban_fdw

A PostgreSQL's Foreign Data Wrapper (FDW) for retrieving the movie ranking data via the public API of douban.com. This FDW is mainly written in GO(cgo).

This toy FDW was inspired by the github repositories as follows

Usage

how to install from source

  1. make sure the path to the PostgreSQL's binary had been exported to the PATH environment variable because it is necessary to use the pg_config command for the source build.

    please use which pg_config to check the path

  2. make sure the GO language was successfully installed and the path to go had been exported to the PATH environment variable

    please use go version to check that

  3. make sure the toolchain of gcc and make were successfully installed

  4. execute the following command

    $cd /path/to/douban_fdw
    $make
    $make install

    if the commands above successed, the shared library files would be installed into /path/to/postgres/install/lib and the other files would be installed into /path/to/postgres/install/share

  5. restart the postgres instance

  6. use the postgres client (such as psql) to connect to the postgres instance and execute the following SQL statement and it should be done by the superuser

    CREATE EXTENSION douban_fdw;

how to use the FDW

  1. make sure that the postgresql server where the fdw was installed to is able to access douban.com on internet

  2. first of all, create a foreign server like

    CREATE SERVER {servername} FOREIGN DATA WRAPPER douban_fdw;

    you can see here for more details about the syntax of CREATE SERVER

  3. define a foreign table with the foreign server above

    CREATE FOREIGN TABLE {tablename} (rating {data type}...) SERVER {servername} OPTIONS (rank_name 'top250');

    you can see here for more details about the syntax of CREATE FOREIGN TABLE

    NOTE: you can name the foreign table whatever you wanted to, but the column name should be as follows. if you defined an column name out of the valid range, it would cause an error when you queried the table

    the column names which can be identified:

    • casts
    • collectcount
    • directors
    • genres
    • id
    • originname
    • rating
    • subtype
    • title
    • url
    • year
  4. use the SELECT statement to query the foreign table defined above

Limitations

  1. this FDW currently can work with PostgreSQL 9.5 only, because the internal fdw interfaces changed

  2. the foreign table defined by douban_fdw can work properly only on the database of which the encoding being UTF8, because most of the data retrieved from douban.com are simplified chinese characters (encoded in UTF8)

  3. according to the official manual, the douban's public api can only be called within 40 times per-hour from one ip address, currently the user can only query the foreign table less than 40 time in a hour

  4. it only supports the public movie api of "top250"(/v2/movie/top250) currently

TODO

the following features are on the way

  • the implementation of the rescan routine
  • support of the IMPORT FOREIGN SCHEMA statement
  • a local persistant buffer to solve the times limit issue of the douban API
  • support the public api for retrieving data of chart "us_box"
  • server-side encoding convert to support the database of which not being UTF8-encoded
  • PostgreSQL 9.6+ support

douban_fdw's People

Contributors

xiaowing avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

douban_fdw's Issues

two core-issues

create the following foreign table

postgres=# \d+ top250
                                     Foreign table "public.top250"
    Column    |         Type          | Modifiers | FDW Options | Storage  | Stats target | Description
 
--------------+-----------------------+-----------+-------------+----------+--------------+------------
-
 rating       | real                  |           |             | plain    |              | 
 title        | text                  |           |             | extended |              | 
 genres       | character varying(64) |           |             | extended |              | 
 casts        | text                  |           |             | extended |              | 
 collectcount | integer               |           |             | plain    |              | 
 originname   | text                  |           |             | extended |              | 
 directors    | text                  |           |             | extended |              | 
 year         | character varying(32) |           |             | extended |              | 
 id           | text                  |           |             | extended |              | 
Server: doubansv
FDW Options: (rank_name 'top250')

the postgres progress would core if one of the following SQL was executed

SELECT rating, title, casts, year FROM top250 ORDER BY rating;  --"ORDER BY rating" caused core

SELECT rating, title, genres, casts, year FROM top250;    --"genres" in the target list caused core

the foreign column seems not take effect if it was used as a argument of a built-in function

create the following foreign table

                                     Foreign table "public.top250"
    Column    |         Type          | Modifiers | FDW Options | Storage  | Stats target | Description
 
--------------+-----------------------+-----------+-------------+----------+--------------+------------
-
 rating       | real                  |           |             | plain    |              | 
 title        | text                  |           |             | extended |              | 
 genres       | character varying(64) |           |             | extended |              | 
 casts        | text                  |           |             | extended |              | 
 collectcount | integer               |           |             | plain    |              | 
 originname   | text                  |           |             | extended |              | 
 directors    | text                  |           |             | extended |              | 
 year         | character varying(32) |           |             | extended |              | 
 id           | text                  |           |             | extended |              | 
Server: doubansv
FDW Options: (rank_name 'top250')

it is weird that the following sql statements resulted to totally different result:

postgres=# SELECT rating, title FROM top250 WHERE CAST(year AS INT) = 1994;
rating | title
--------+-------
(0 rows)

postgres=# SELECT rating, title, CAST(year AS INT) FROM top250 WHERE CAST(year AS INT) = 1994;
rating | title | year
--------+----------------+------
8.6 | 东邪西毒 | 1994
8.7 | 燃情岁月 | 1994
9.6 | 肖申克的救赎 | 1994
9.4 | 这个杀手不太冷 | 1994
9.4 | 阿甘正传 | 1994
9.1 | 活着 | 1994
9.1 | 饮食男女 | 1994
8.9 | 狮子王 | 1994
8.8 | 低俗小说 | 1994
8.8 | 阳光灿烂的日子 | 1994
8.7 | 重庆森林 | 1994
(11 rows)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.