Code Monkey home page Code Monkey logo

packages-http's Introduction

SWI-Prolog HTTP support library

This directory provides the SWI-Prolog libraries for accessing and providing HTTP services.

Client library

The main client library is library(http/http_open), which can open both HTTP and HTTPS connections and handle all request methods.

Server library

The main server libraris are

  • library(http/thread_httpd) implements the server
  • library(http/http_dispatch) implements binding locations predicates
  • library(http/http_unix_daemon) implements integration in various Unix server managers and in general provides a suitable entry point for HTTP servers on Unix.
  • library(http/html_write) implements generating HTML
  • library(http/http_json) implements reading and writing JSON documents.

For simplicity, you can use library(http/http_server), which combines the typical HTTP libraries that most servers need. The idea of a common request handling system and three controlling libraries is outdated; the threaded server now being the only sensible controlling library.

Requirements

This library uses functionality from the ssl package to support HTTPS, the sgml package to read XML/HTML and the clib package for various extensions.

packages-http's People

Contributors

ahefner avatar anniepoo avatar borisvassilev avatar brebs-gh avatar dtonhofer avatar eshelyaron avatar fkd13 avatar gavinmendelgleason avatar honnix avatar jamesnvc avatar janwielemaker avatar jrvosse avatar kamahen avatar keriharris avatar likelion avatar mbrock avatar mndrix avatar ntgiwsvp avatar royratcliffe avatar rrooij avatar teamspoon avatar thetrime avatar triska avatar uwn avatar wdiestel avatar wouterbeek avatar xpxaxsxi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

packages-http's Issues

Foreign predicate sgml:sgml_parse/2 did not clear exception

Under certain circumstances, I get the following message:

Foreign predicate sgml:sgml_parse/2 did not clear exception: error(timeout_error(read,(0x26d8130)),context(sgml:sgml_parse/2,_72026))

This once arose with swissues, when running:

./update_html.sh

SWI version: 7.5.3-7-gdfc4edf.

Wishlist: Support wss:// scheme for secure WebSocket

wss is to ws what https is to http.

Therefore, ideally, the following query should automatically:

  1. establish a TLS connection to the host
  2. open a WebSocket over the secure connection.

Currently, it does not work at all:

?- http_open_websocket('wss://html5rocks.websocket.org/echo', WS, []).
ERROR: http_reply `'wss://html5rocks.websocket.org/echo'' does not exist

If possible, please add this feature. Thank you!

http_reply/[2,3,4] fail unexpectedly

Currently, only the 5-argument version of http_reply works as documented:

?- http_reply(html([]), current_output, [], [], C).
HTTP/1.1 200 OK ...
C = 200.

If I omit any of the optional arguments, which should work according to the documentation, I get false in all cases:

?- http_reply(html([]), current_output, [], C).
false.

?- http_reply(html([]), current_output, []).
false.

?- http_reply(html([]), current_output).
false. 

--ip=IP option is broken

Let server.pl consist of:

:- use_module(library(http/thread_httpd)).
:- use_module(library(http/http_dispatch)).
:- use_module(library(http/http_unix_daemon)).

:- initialization http_daemon.

:- http_handler(/, write_index, []).

write_index(_Request) :-
    format("Content-type: text/plain~n~n"),
    format("hello!!").

When I do:

$ swipl server.pl --port=5050 --interactive 

then everything works exactly as expected.

In contrast, when I do (--ip=localhost is an example that is shown when using --help):

$ swipl server.pl --port=5050 --interactive --ip=localhost 

I get:

ERROR: ssl:'_ssl_context'/4: Arguments are not sufficiently instantiated

The --ip option is broken since 7.3.23.

I am filing this as an issue instead of submitting a pull request right away because I have a bit of feedback on the options processing code: As this issue shows, overloading the various options with different syntax variants to specify ports and addresses in several ways makes the code a lot harder to test and quite fragile.

Please let us reflect a moment if it is not better to keep the --port and --ip options cleanly separated from the --http and --https options. If—as was the case until very recently!— there were only one way to specify each of:

  • protocol (--http or --https)
  • port (--port=...) and
  • address (--ip=...)

then I think this issue would have been found a lot earlier, and most likely on the SWI site itself.

Content-encoding: gzip decompression does not work reliably

Test case:

$ swipl
Welcome to SWI-Prolog (Multi-threaded, 64 bits, Version 7.3.14-57-g6f9925a)
Copyright (c) 1990-2015 University of Amsterdam, VU Amsterdam
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to redistribute it under certain conditions.
Please visit http://www.swi-prolog.org for details.

For help, use ?- help(Topic). or ?- apropos(Word).

?- [library(http/http_client)].
true.

?- [library(zlib)].
true.

?- repeat,
     http_get('http://i.imgur.com/MOAsFWf.gifv', Data, [])
     sub_atom(Data, 0, 5, _, Start), 
     portray_clause(Start), false.

Sometimes the transparent gzip decompression works, and very often it doesn't:

'\037\\213\\b\000\\000\'.
'<!doc'.
'\037\\213\\b\000\\000\'.
'\037\\213\\b\000\\000\'.
'\037\\213\\b\000\\000\'.
'\037\\213\\b\000\\000\'.
'<!doc'.
'<!doc'.
'\037\\213\\b\000\\000\'.
'\037\\213\\b\000\\000\'.
'\037\\213\\b\000\\000\'.
etc.

When I extend the test case to obtain the Content-Encoding header field, I get:

?- repeat,
     http_get('http://i.imgur.com/MOAsFWf.gifv', Data, [headers(Hs)]),
     ( memberchk(content_encoding(CE), Hs) -> true ; CE = none),
     sub_atom(Data, 0, 5, _, Start),
     portray_clause(ce_start(CE, Start)), false.
ce_start(gzip, '\037\\213\\b\000\\000\').
ce_start(gzip, '\037\\213\\b\000\\000\').
ce_start(gzip, '\037\\213\\b\000\\000\').
ce_start(no, '<!doc').
ce_start(no, '<!doc').
ce_start(gzip, '\037\\213\\b\000\\000\').
ce_start(gzip, '\037\\213\\b\000\\000\').
ce_start(gzip, '\037\\213\\b\000\\000\').
ce_start(gzip, '\037\\213\\b\000\\000\').
ce_start(no, '<!doc').
ce_start(no, '<!doc')
etc.

HTTP handler seems to disregard option `priority/1`

The following server program serves handler b at /, even though handler a has a higher priority/1 option:

:- use_module(library(http/html_write)).
:- use_module(library(http/http_dispatch)).
:- use_module(library(http/http_unix_daemon)).

:- http_handler(/, a, [prioty(1)]).
:- http_handler(/, b, []).

a(_) :-
  reply_html_page([], ["A"]).

b(_) :-
  reply_html_page([], ["B"]).

Expected: handler a to be served at / instead.

Document negotiated cipher in HTTP log file

With pull request SWI-Prolog/packages-ssl#86, ssl_session/2 also reports the negotiated cipher.

In my view, this information should be made available in log files, to gather statistics about the capabilities of clients that typically visit, so that suitable ciphers can be more easily chosen.

This of course affects exclusively HTTPS servers, where I think it would be great to have an additional cipher(Cipher) entry in the log file for each request. Please consider making this available by default in log files.

Thank you!

please shorten log entries for "remote hangup after ..."

Please change the following DCG rule in http_header.pl to not write all of Data into the log file:

prolog:error_message(http_write_short(Data, Sent)) -->
        [ '~p: remote hangup after ~D bytes'-[Data, Sent] ].

This is because such requests are comparatively frequent, and the log grows too quickly if larger files are transferred using the bytes/2 method.

For example, would it make sense to log at most the first 10 bytes of the response in such situations?

Make mime_type_encoding/2 into a hook

If we want our HTTP server to reply N-Triples, we must include the encoding in the Content-Type header.
Since N-Triples is UTF-8 by default, the encoding is redundant:

Content-Type: application/n-triples; charset=utf-8

A small improvement would be to turn mime_type_encoding/2 in module http_header into a multi-file predicate:

mime_type_encoding('application/json',        utf8).
mime_type_encoding('application/jsonrequest', utf8).
mime_type_encoding('application/x-prolog',    utf8).

E.g., ee could extend this list in module semweb/rdf11 as follows:

mime_type_encoding('application/n-quads', utf8).
mime_type_encoding('application/n-triples', utf8).
mime_type_encoding('application/sparql-query', utf8).
mime_type_encoding('application/trig', utf8).
mime_type_encoding('text/turtle', utf8).

Wishlist: A+ security assessment for HTTPS server

Currently, the HTTPS Unix daemon can be easily invoked to score A- in the free security assessment performed by SSL Labs, using for example:

$ sudo swipl server.pl --interactive --user=you \
    --https --certfile=server.crt --keyfile=server.key \
    --cipherlist='DEFAULT:!RC4:!DES-CBC-SHA'

Ideally, the server should score A and A+ when invoked with the right options.

Getting to this point may require additional options and also some improvements or extensions to library(ssl).

@chigley: I would greatly appreciate your input on this issue, any time it is convenient for you. Thank you in advance! (Only a few days ago, we could not even reach A- due to the lack of ECDHE!)

http_open does not comply to explicitly set reply encoding

Currently http_open/3 does not seem to comply to the encoding set by the Content-Type reply header. While the encoding is claimed to be UTF-8, the content clearly contains incorrect characters (for non-ASCII Unicode characters):

?- http_open('https://api.test.triply.cc/info', _In, [header(content_type,ContentType)]),
   json_read_dict(_In, Dict),
   stream_property(In, encoding(Enc)).
ContentType = 'application/json; charset=utf-8',
Dict = _22402{branding:_20790{description:"Linking Business to Datað\237\\221\ª",  ...
Enc = octet.

Expected: The encoding of stream In to automatically be set based on the value of the Content-Type reply header.

--interactive switch unexpectedly affects dispatching

I am filing this as an HTTP issue because this is where I have found a case that is easy to reproduce, and it is also where this issue affects me most. This may of course be due to a more general underlying issue with file loading.

To reproduce, please download Proloxy, and make sure to run any (arbitrary) web server on port 3031, because that's the port used in the config.pl example configuration of Proloxy.

Then, please run:

$ swipl proloxy.pl config.pl --port=5050 --interactive

And test the dispatch with:

$ wget localhost:5050

This will indirectly (via Proloxy) fetch / from the web server on port 3031, and works completely as expected.

In contrast, if you omit the --interactive switch and use --no-fork instead, the proxy server again starts exactly as expected:

$ swipl proloxy.pl config.pl --port=5050 --no-fork
% Started server at http://localhost:5050/

But we now get:

$  wget localhost:5050
--2016-12-15 01:00:10--  http://localhost:5050/
Resolving localhost (localhost)... ::1, 127.0.0.1
Connecting to localhost (localhost)|::1|:5050... failed: Connection refused.
Connecting to localhost (localhost)|127.0.0.1|:5050... connected.
HTTP request sent, awaiting response... 503 Service Unavailable
2016-12-15 01:00:10 ERROR 503: Service Unavailable.

If you telnet to Proloxy or access it differently, you see the exact reason:

request_prefix_target/3: No matching rule for ...

This error stems from Proloxy, apparently because the file config.pl that appears on the command line was not loaded in the same way as if we had used --interactive.

Note that if I turn around the file arguments, i.e.:

$ swipl config.pl proloxy.pl --port=5050 --no-fork

Then everything works as expected again. Intuitively, I would like the same outcome in all of these cases, like in the case of the --interactive switch. In particular, loading the config file before or after proloxy.pl should ideally not make any difference, at least not in such simple cases.

HTTP server crashes with fatal signal 11: C-stack trace labeled "crash"

Let server.pl consist of:

:- use_module(library(http/thread_httpd)).
:- use_module(library(http/http_dispatch)).
:- use_module(library(http/http_unix_daemon)).

:- initialization http_daemon.

:- http_handler(/, handle_request, []).

handle_request(_Request) :-
    format("Content-type: text/plain~n~n"),
    format("hello!!").

Start the server with:

$ swipl server.pl --interactive --port=3030

Let test_server.tcl consist of:

set i 0
while {true} {
    if {$i % 100 == 0} { puts "$i."}
    set s [socket localhost 3030]
    puts $s "GET /"
    set i [expr {$i + 1}]
}

and run this Tcl script with:

$ tclsh test_server.tcl

On OS-X 10.10.1, I get the following output from the test script:

$ tclsh test_server.tcl
0.
100.
200.
...
2500.
couldn't open socket: nodename nor servname provided, or not known

and the SWI-Prolog HTTP server crashes with:

% Started server at http://localhost:3030/
% Started server at port 3030
Welcome to SWI-Prolog (Multi-threaded, 64 bits, Version 7.3.14-62-g18f1547-DIRTY)
Copyright (c) 1990-2015 University of Amsterdam, VU Amsterdam
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to redistribute it under certain conditions.
Please visit http://www.swi-prolog.org for details.

For help, use ?- help(Topic). or ?- apropos(Word).

?- 
<b>SWI-Prolog [thread 3 (httpd@3030_2) at Sat Jan 23 23:38:32 2016]: received fatal signal 11 (segv)
C-stack trace labeled "crash":</b>
  [0] /usr/local/lib/swipl-7.1.27/lib/x86_64-darwin14.0.0/libswipl.dylib(save_backtrace+0xb9) [0x104a60d69]
  [1] /usr/local/lib/swipl-7.1.27/lib/x86_64-darwin14.0.0/libswipl.dylib(crashHandler+0xa8) [0x104a61348]
  [2] /usr/local/lib/swipl-7.1.27/lib/x86_64-darwin14.0.0/libswipl.dylib(dispatch_signal+0x2d3) [0x104a09333]
  [3] /usr/lib/system/libsystem_platform.dylib(_sigtramp+0x1a) [0x7fff91c49f1a]
  [5] /usr/local/lib/swipl-7.1.27/lib/x86_64-darwin14.0.0/libswipl.dylib(Sgetcode+0x260) [0x104a4e520]
  [6] /usr/local/lib/swipl-7.1.27/lib/x86_64-darwin14.0.0/readutil.dylib(read_line_to_codes3+0x7c) [0x104b30b0c]
  [7] /usr/local/lib/swipl-7.1.27/lib/x86_64-darwin14.0.0/libswipl.dylib(PL_next_solution+0x80c2) [0x10499dfd2]
  [8] /usr/local/lib/swipl-7.1.27/lib/x86_64-darwin14.0.0/libswipl.dylib(callProlog+0x151) [0x1049ebcc1]
  [9] /usr/local/lib/swipl-7.1.27/lib/x86_64-darwin14.0.0/libswipl.dylib(start_thread+0x140) [0x104a1e810]
  [10] /usr/lib/system/libsystem_pthread.dylib(_pthread_body+0x83) [0x7fff88bb72fc]
  [11] /usr/lib/system/libsystem_pthread.dylib(_pthread_body+0x0) [0x7fff88bb7279]
  [12] /usr/lib/system/libsystem_pthread.dylib(thread_start+0xd) [0x7fff88bb54b1]

On Debian 8.2, the test script test_server.tcl runs seemingly without problems, but as soon as I interrupt the script with Ctrl+C, the SWI-Prolog HTTP server also crashes on Debian, and its output is:

?- 
SWI-Prolog [thread 4 (httpd@3030_3) at Sat Jan 23 23:45:32 2016]: received fatal signal 11 (segv)

SWI-Prolog [thread 3 (httpd@3030_2) at Sat Jan 23 23:45:32 2016]: received fatal signal 11 (segv)

SWI-Prolog [thread 5 (httpd@3030_4) at Sat Jan 23 23:45:32 2016]: received fatal signal 11 (segv)
C-stack trace labeled "crash":
C-stack trace labeled "crash":

SWI-Prolog [thread 6 (httpd@3030_5) at Sat Jan 23 23:45:32 2016]: received fatal signal 11 (segv)

SWI-Prolog [thread 2 (httpd@3030_1) at Sat Jan 23 23:45:32 2016]: received fatal signal 11 (segv)
C-stack trace labeled "crash":
C-stack trace labeled "crash":
C-stack trace labeled "crash":
  [0] save_backtrace() at :? [0x7f5c64ef183a]
  [0] save_backtrace() at :? [0x7f5c64ef183a]
  [0] save_backtrace() at :? [0x7f5c64ef183a]
  [0] save_backtrace() at :? [0x7f5c64ef183a]
  [0] save_backtrace() at :? [0x7f5c64ef183a]
  [1] crashHandler() at pl-cstack.c:? [0x7f5c64ef1a04]
  [1] crashHandler() at pl-cstack.c:? [0x7f5c64ef1a04]
  [1] crashHandler() at pl-cstack.c:? [0x7f5c64ef1a04]
  [1] crashHandler() at pl-cstack.c:? [0x7f5c64ef1a04]
  [1] crashHandler() at pl-cstack.c:? [0x7f5c64ef1a04]
  [2] dispatch_signal() at pl-setup.c:? [0x7f5c64ea162b]
  [2] dispatch_signal() at pl-setup.c:? [0x7f5c64ea162b]
  [2] dispatch_signal() at pl-setup.c:? [0x7f5c64ea162b]
  [2] dispatch_signal() at pl-setup.c:? [0x7f5c64ea162b]
  [3] __restore_rt() at ??:? [0x7f5c64c0b8d0]
  [3] __restore_rt() at ??:? [0x7f5c64c0b8d0]
  [3] __restore_rt() at ??:? [0x7f5c64c0b8d0]
  [2] dispatch_signal() at pl-setup.c:? [0x7f5c64ea162b]
  [3] __restore_rt() at ??:? [0x7f5c64c0b8d0]
  [4] S__fillbuf() at ??:? [0x7f5c64ee361d]
  [4] S__fillbuf() at ??:? [0x7f5c64ee361d]
  [4] S__fillbuf() at ??:? [0x7f5c64ee361d]
  [3] __restore_rt() at ??:? [0x7f5c64c0b8d0]
  [5] Sgetcode() at ??:? [0x7f5c64ee40d0]
  [4] S__fillbuf() at ??:? [0x7f5c64ee361d]
  [5] Sgetcode() at ??:? [0x7f5c64ee40d0]
  [4] S__fillbuf() at ??:? [0x7f5c64ee361d]
  [5] Sgetcode() at ??:? [0x7f5c64ee40d0]
  [6] read_line_to_codes3() at readutil.c:? [0x7f5c63645c67]
  [6] read_line_to_codes3() at readutil.c:? [0x7f5c63645c67]
  [5] Sgetcode() at ??:? [0x7f5c64ee40d0]
  [6] read_line_to_codes3() at readutil.c:? [0x7f5c63645c67]
  [7] PL_next_solution() at ??:? [0x7f5c64e55a65]
  [7] PL_next_solution() at ??:? [0x7f5c64e55a65]
  [5] Sgetcode() at ??:? [0x7f5c64ee40d0]
  [8] callProlog() at :? [0x7f5c64e8ada9]
  [6] read_line_to_codes3() at readutil.c:? [0x7f5c63645c67]
  [8] callProlog() at :? [0x7f5c64e8ada9]
  [7] PL_next_solution() at ??:? [0x7f5c64e55a65]
  [6] read_line_to_codes3() at readutil.c:? [0x7f5c63645c67]
  [7] PL_next_solution() at ??:? [0x7f5c64e55a65]
  [8] callProlog() at :? [0x7f5c64e8ada9]
  [9] start_thread() at pl-thread.c:? [0x7f5c64eb7fb2]
  [9] start_thread() at pl-thread.c:? [0x7f5c64eb7fb2]
  [8] callProlog() at :? [0x7f5c64e8ada9]
  [7] PL_next_solution() at ??:? [0x7f5c64e55a65]
  [9] start_thread() at pl-thread.c:? [0x7f5c64eb7fb2]
  [10] start_thread() at ??:? [0x7f5c64c040a4]
  [10] start_thread() at ??:? [0x7f5c64c040a4]
  [8] callProlog() at :? [0x7f5c64e8ada9]
  [9] start_thread() at pl-thread.c:? [0x7f5c64eb7fb2]
  [10] start_thread() at ??:? [0x7f5c64c040a4]
  [9] start_thread() at pl-thread.c:? [0x7f5c64eb7fb2]
  [10] start_thread() at ??:? [0x7f5c64c040a4]
  [11] clone() at ??:? [0x7f5c6493904d]

Every second request is rejected (OPTIONS request).

I am trying to connect a Backbone.js app (the demo Todo list) via REST to SWI-Prolog.
scr01

The js-app makes calls to GET and POST. The latter is preceded by an OPTIONS request. I can savely handle this one and the POST gets through. Except that in every second POST request some old input data is added to the OPTIONS handler. And this then leads to the "Bad Request" as shown in the logfile excerpt from below:

`/Wed May 3 00:40:52 2017/ request(1, 1493764852.433, [peer(ip(127,0,0,1)),method(get),request_uri('/todo1'),path('/todo1'),http_version(1-1),host(localhost),port(3001),user_agent('Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:53.0) Gecko/20100101 Firefox/53.0'),referer('http://localhost:3000/index.html'),origin('http://localhost:3000'),dnt('1'),connection('keep-alive')]).
completed(1, 0.0009339069999999999, 826, 200, ok).

/Wed May 3 00:40:55 2017/ request(2, 1493764855.670, [peer(ip(127,0,0,1)),method(options),request_uri('/todo1'),path('/todo1'),http_version(1-1),host(localhost),port(3001),user_agent('Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:53.0) Gecko/20100101 Firefox/53.0'),access_control_request_method('POST'),access_control_request_headers('content-type'),origin('http://localhost:3000'),dnt('1'),connection('keep-alive')]).
completed(2, 0.002140802, 0, 200, ok).

/Wed May 3 00:40:55 2017/ request(3, 1493764855.699, [peer(ip(127,0,0,1)),method(post),request_uri('/todo1'),path('/todo1'),http_version(1-1),host(localhost),port(3001),user_agent('Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:53.0) Gecko/20100101 Firefox/53.0'),content_type('application/json'),referer('http://localhost:3000/index.html'),content_length(42),origin('http://localhost:3000'),dnt('1'),connection('keep-alive')]).
completed(3, 0.0010439790000000004, 897, 200, ok).

completed(0, 0.0028612840000000004, 389, 400, error("Illegal HTTP request: {"title":"11","completed":false,"order":1}OPTIONS /todo1 HTTP/1.1 (in_http_request)")).
`
As wireshark confirmed that these data did not go over the wire and was not part of the OPTIONS request sent, I assume it must be in some buffer that is not reset.
I would be very happy to get any hints where to start searching for this in the code base.
Thanks for your help.

--debug=topic yields error with thread_httpd

My goal is to see debug messages on the console with the HTTP Unix daemon in combination with thread_httpd.

When I start with a very simple server template, server.pl:

:- use_module(library(http/thread_httpd)).
:- use_module(library(http/http_unix_daemon)).

:- initialization http_daemon.

Then I get among the help messages:

$ swipl server.pl --help
...
%   --debug=topic      Print debug message for topic
...

However, when I try to enable a debug topic, say, test, I get:

$ swipl server.pl -- --debug=test --port=8080 --interactive
Warning: test: no matching debug topic (yet)
ERROR: thread_create/3: Type error: `bool' expected, found `test' (an atom)

Please improve inspection support for redirections

First, I suggest the following patch for http_open.pl so that the exception is only thrown when the redirection limit is exceeded (as documented):

diff --git a/http_open.pl b/http_open.pl
index b163883..66478df 100644
--- a/http_open.pl
+++ b/http_open.pl
@@ -657,7 +657,7 @@ do_open(_Version, Code, Comment, _,  _, Parts, _, _, _) :-
 redirect_limit_exceeded(Options, Max) :-
        aggregate_all(count, member(visited(_), Options), N),
        option(max_redirect(Max), Options, 10),
-       (Max == infinite -> fail ; N >= Max).
+       (Max == infinite -> fail ; N > Max).
 
 
 %%     redirect_loop(Parts, Options) is semidet.

Second, a small feature request: Please add a supported way to reason about redirections. One important use-case for this: A reverse proxy server needs to analyze redirections from the target server, and rewrite them so that they match the client's view of the service. Thus, we cannot even allow a single redirection within http_open/3 in such use cases, but need to analyze and rewrite all redirections in the chain in general.

Currently, to see whether the target server redirects a request, we can use:

catch(http_open(Url, In, [max_redirect(0)]),
      error(permission_error(redirect, _, Redirect), _), true),
(   nonvar(Redirect) -> ...
;   etc.
)

But an important information is missing from the exception: The Code that was used to redirect. If possible, please:

(1) add the code to the exception and
(2) make this the official way to inspect a redirection, i.e., document the exception term so that user code can rely on it.

Alternatively, please add a different way to easily reason about redirections, fitting the current API. Thank you!

Better SSL error messages (e.g., when setting port 80)?

Port 80 is not the correct port for HTTPS requests of course, but the error message is currently not very descriptive:

?- use_module(library(http/http_open)).
?- use_module(library(http/http_ssl_plugin)).
?- http_open('https://aap.nl:80', In, [cert_verify_hook(cert_accept_any)]).
ERROR: SSL(140770FC) func(119): reason(252)

Is there an existing mapping from SSL error codes to human-readable descriptions that can help the programmer a bit better?

Suggestion: Hardened mode for web services

Especially for web services, it would be great if there were a mode that works as securely as sensible by default, and can be easily enabled, for example via an option such as --hardened in the HTTP Unix daemon. At the cost of making development somewhat harder (if enabled), such a mode would reveal less information to attackers.

Configuration options that could be affected by such a mode come to mind immediately, especially after the discussion in SWI-Prolog/plweb#23:

  • obsolete protocols should be reliably disabled in this mode, without weakening security if users themselves have already chosen more secure settings.
  • backtraces that may expose sensitive data (such as login names, paths etc.) must be disabled.
  • anything else?

Wishlist: Server-side SNI

Server Name Indication (SNI) is an important and by now ubiquitously supported TLS extension that allows us to host different HTTPS sites from a single IP address.

Without server-side SNI (i.e., status quo with the SWI HTTP and SSL libraries), essentially each domain for which we want to provide an HTTPS server needs to point to a separate IP, because the TLS handshake transfers the single certificate that can currently be specified, and that certificate is typically only valid for only a couple of (sub-)domains, if at all.

SNI extends this by providing the right certificate depending on the domain that the client used to access the site. This corresponds to the HTTP header field Host. However, critically, it takes place already at the TLS layer, before the HTTP transfer is even initiated.

Supporting SNI therefore requires changes to both the HTTP and SSL libraries (yeah!). I am filing this as an HTTP issue because I want to first discuss how such an option should be presented to users, and then extend the HTTP Unix daemon with options to make the information available in the first place.

Currently, the following are the most important options we currently pass to the HTTP Unix daemon when launching an HTTPS server:

  • --certfile=C
  • --keyfile=F
  • --pwfile=P or --password=P

The reason --pwfile is provided is because the command line arguments can be easily inspected, whereas a password file can be made readable only for users who need to access the private key. (As an aside, I sometimes find it quite frustrating that the HTTP Unix daemon may not have sufficient privileges at the time it reads the certificate and key files even when it is launched as root, because it drops the privileges before that point!)

To support server-side SNI, the above options need to be generalized so that they depend on the hostname that the client has used to connect to the server. That name is transferred by—you guessed it—client-side SNI, as plain text because we need that information before the TLS handshake takes place.

So, all of the above need to depend on hostnames, or rather patterns of hostnames, and maybe also a default certificate that is used if no matching host can be found. Option processing in the HTTP Unix daemon is already quite complex and error prone (#54). With server-side SNI, we need the ability to specify potentially many different certificates, keys and passwords that depend on hostname patterns. We could cram these semantics into command line options too, but these options may become arbitrarily long and break system limits or become extremely hard to edit and understand. Therefore, I think the time has come to put this into a config file.

Here is a start to let us discuss how such an option could look like. First, all other command line options can remain the same. On the command line, I only need a single new option, let us call it:

  • --sni=File

In such an SNI config file, we need to provide a series of hostnames, and for each such host:

  • certificate that is used for that host
  • private key that is used for that host
  • if applicable, the password that protects the private key.

So, the SNI config file could for example consist of Prolog facts of the form:

sni(HostRegexp, [certfile=C,keyfile=K,pwfile=P]).

Instead of pwfile, password can also be supported, in analogy to the command line options that are currently used.

So, we may for example have entries like:

sni('swish.swi-prolog.org', [certfile='swish.crt',keyfile='swish.key']).
sni('.*swi-prolog.org', [certfile='swi.crt',keyfile='swi.key',pwfile='swi.pwd']).

The hostname matching could be regexp-based, or, in a first step, exact. Conceptually, when a client initiates a TLS handshake, then the server fetches the used host name and uses the first matching certificate and private key for the remainder of the connection. If regexps are used, then this naturally subsumes the "default" case, because we can use the empty regexp to match all remaining hosts.

If these options could somehow be made available as part of the context that is passed around in the HTTP server libraries, then I can modify the SSL library to fetch the matching certificate based on these options. Please let me know what you think. Thank you!

SSL Certificate renewal

Just setup a server using LetsEncrypt! Thanks to https://github.com/triska/letswicrypt that wasn't too hard. LetsEncrypt! certificates expire quickly though. Setting up automatic renewal of the certificates themselves isn't too hard, but how to update a (long) running server?

Ideally it would check the modification times of the certifications, but it can't because the LetsEncrypt! certificates are only readable by root and the server long lost the privileges to read them. I see two ways out:

  • Encrypt the private key and only read the password as root. If the key itself and the certificate
    chain are world readable (which is fine AFAIK as long as the private key is encrypted), it should
    be possible to update the SSL context used by the server without restart.
  • On startup read the expiration date from the certificate and commit suicide at some scheduled
    time the day before.

Documentation improvements for http_get/3 and http_open/3

First, a very simple discrepancy I noticed: In http_get/3, the reply_header(Hs) option yields a list of Name(Value) pairs, but the documentation states it yields a list of Name=Value pairs. I guess it is the documentation that should be changed in this case, although yielding a list of Name=Value pairs would indeed also make a lot of sense in my view.

Second, a more elementary request for clarification:

Throughout the documentation of the HTTP libraries, library(http_open) is referred to as a "Simple HTTP client" and "lightweight library for opening a[n] HTTP URL address as a Prolog stream". In addition: "It can only deal with the HTTP GET protocol".

In contrast, library(http_client) and its http_get/3 is described as providing "more powerful access to reading HTTP resources", including "keep-alive, chunked transfer and a plug-in mechanism providing conversions based on the MIME content-type".

From this, one gets the distinct impression that one should better use http_get/3 to be on the safe side, choosing the more powerful library to make the application future-proof for further features.

Somewhat contradicting the above, we learn in other sections of the documentation that library(http_open) can in fact "be extended by loading two additional modules that acts [sic] as plugins", enabling chunked transfer encoding and the POST method in addition to GET, HEAD and DELETE, and in #9, you even state that "http_get/3 is basically superseded by http_open/3".

Please clarify the documentation so that all testing, usage and development focuses on the library that is meant to be used more thoroughly. Thank you!

http_reply: please add support for http_reply(codes(Cs), ...)

It would be very useful to reply from a list of codes using http_reply(codes(Cs), ...), especially since http_get/3 already supports reading to a list of codes. Please consider supporting this.

My immediate use case is the implementation of a simple reverse HTTP proxy: I query existing HTTP servers and relay their responses to the client. I can use a memory file for this, but would prefer to simply use a list of codes.

http_log.pl: improve logging of http_reply(Reply, Options) case

It seems that http_log.pl would benefit from:

map_exception(http_reply(Reply,_), Reply).

or something similar, so that replies of the form http_reply(Reply, Options) are logged as expected. Otherwise, we see terms like:

error("Unknown message: http_reply(stream(<stream>(0x7f916287dbf0),137) ...)

in httpd.log when for example replying with http_reply(stream(S), [...]).

However, one benefit of the current logging is that the file contains syntactically valid Prolog terms, whereas <stream>(...) is (of course intentionally) not a valid Prolog term, so especially in connection with the stream(S) option, care should be taken to preserve valid Prolog syntax in the log file.

Keep-alive connections can cause extreme performance degradation

Steps to reproduce (7.3.14/latest git version):

  1. Clone Proloxy

  2. Install shellinabox

  3. Start shellinabox with:

    $ shellinaboxd  --disable-ssl --localhost-only
    

    You can now access shellinabox on http://localhost:4200 to see that it works correctly.

  4. Let shellinabox.pl consist of the single clause:

    request_prefix_target(Request, '', TargetURI) :-
            memberchk(request_uri(URI), Request),
            atomic_list_concat(['http://localhost:4200',URI], TargetURI).
    
  5. Start Proloxy with:

    $ swipl shellinabox.pl proloxy.pl --interactive --port=4041
    

    You can now access Proloxy on http://localhost:4041, and it will relay all traffic to the shellinabox server behind the scenes. At this point, everything should work quite smoothly (awesome SWI HTTP libraries!), with roundtrip times in the millisecond range. If you look at the HTTP traffic, you will see though that, contrary to what the manual states, Keep-alive is not the default for the answers of Proloxy: Its replies to requests contain the header field Connection: close.

  6. To enable Keep-alive connections between the client and Proloxy, we need to patch Proloxy to reply with Connection: Keep-Alive, using the patch shown below. If you apply the patch, then restart Proloxy, and again go to http://localhost:4041, performance will be severely degraded. Roundtrip times increase thousands of times, and responses now take several seconds.

Please let me know whether this is the correct way to enable Keep-alive connections, or alternatively please enable Keep-alive connections per default (as stated in the manual) also for the kind of answers that Proloxy uses.

Note that the communication between Proloxy and shellinabox is not and need not be Keep-alive, because these reside on the same machine and the overhead for setting up an HTTP connection is low. My eventual goal is to enable completely transparent keep-alive connections over the proxy too, but for that at least the Client <-> Proloxy connection (which is the focus of this issue) must work efficiently too.

Here is the patch, which just adds Connection: Keep-alive to the headers of Proloxy responses:

diff --git a/proloxy.pl b/proloxy.pl
index 1db8685..d7aa81a 100644
--- a/proloxy.pl
+++ b/proloxy.pl
@@ -80,7 +80,7 @@ proxy(data(Method), _, _, TargetURI, Request) :-
                                   header(content_type, ContentType)]),
         call_cleanup(read_string(In, _, Bytes),
                      close(In)),
-        throw(http_reply(bytes(ContentType, Bytes))).
+        throw(http_reply(bytes(ContentType, Bytes), [connection('Keep-Alive')])).
 proxy(other(Method), Prefix, URI, TargetURI, _) :-
         http_open(TargetURI, In, [method(Method),
                                   % cert_verify_hook(cert_accept_any),
@@ -99,7 +99,7 @@ proxy(other(Method), Prefix, URI, TargetURI, _) :-
             throw(http_reply(Reply))
         ;   Code == 404 ->
             throw(http_reply(not_found(URI)))
-        ;   throw(http_reply(bytes(ContentType, Bytes)))
+        ;   throw(http_reply(bytes(ContentType, Bytes), [connection('Keep-Alive')]))
         ).
 
 redirect_code(301, moved).

Improve cooperation between http_unix_daemon and make/0

To conveniently recompile a running server, make/0 is ideal.

Unfortunately, this currently does not work with library(http/http_unix_daemon). For example, if server.pl consists of:

:- use_module(library(http/thread_httpd)).
:- use_module(library(http/http_dispatch)).
:- use_module(library(http/http_unix_daemon)).

:- initialization http_daemon.

:- http_handler(/, handle_request, []).

handle_request(_Request) :-
    format("Content-type: text/plain~n~n"),
    format("hello!!").

I can easily run the server interactively:

$ swipl server.pl --interactive --port=3030 

However, when I then make changes to server.pl and recompile with make/0, I get:

?- make.
% %.../server compiled 0.00 sec, 0 clauses
ERROR: Socket error: Address already in use

There are of course ways to work around this, but it would be nice if no further changes were necessary to server.pl, and make/0 would simply recompile the code.

HTTPS server timeout is inaccurate

When I set timeout/1 in an HTTPS server, I get about twice the timeout that I actually set.

As a test case, place https_server.pl in packages-ssl, consisting of:

:- use_module(library(http/thread_httpd)).
:- use_module(library(http/http_ssl_plugin)).

https_server(Port, Options) :-
        http_server(reply,
                    [ port(Port),
                      ssl([ certificate_file('etc/server/server-cert.pem'),
                            key_file('etc/server/server-key.pem'),
                            password(apenoot1)
                          ])
                    | Options
                    ]).
reply(_) :-
        format("Content-type: text/plain~n~n"),
        format("Hello!").

Start the server with:

$ swipl https_server.pl 
...

?- https_server(1125, [timeout(5)]).
% Started server at https://localhost:1125/
true.

Then, connect to the server via:

$ time openssl s_client -connect localhost:1125

After about 10 seconds, I get:

...
read:errno=0

real	0m10.019s
user	0m0.000s
sys	0m0.004s

Running SWI-Prolog HTTPS services as Unix daemons

A simple skeleton for an HTTPS server is:

https_server(Port) :-
        http_server(http_dispatch,
                    [ port(Port),
                      ssl([ ...])
                    ]).

where the ssl(....) option is suitably filled with certificate etc. Everything works nicely.

I now want to start the server from a systemd or init.d startup script. Alas, the http_unix_daemon library does not seem to support HTTPS. I would prefer to start the server with something similar to:

$ swipl -s server.pl -- --user=my_user --port=443 --ssl=true --cert=... etc.

Since this does not work, what can I do instead? First, it obviously needs to run as root to make the service available on port 443. To drop root privileges, I change the above code to:

https_server(Port) :-
        http_server(http_dispatch,
                    [ port(Port),
                      ssl([ ...])
                    ]),
        set_user_and_group(my_user).

and then I can successfully do:

$ swipl -f server.pl -g 'https_server(443)'

to make the server available.

But this starts an interactive shell instead of detaching the process, so it is not yet suitable for use in init scripts. The next attempt is typically similar to:

$ swipl -f server.pl -g 'https_server(443)' &

But then the process will receive SIGHUP and terminate when SWI next asks for user input. So we can use nohup in combination with &, disown -h etc., all just to run the process somehow in the background.

Is there any recommendation regarding how to start SWI HTTPS servers in the background? Obviously, extending http_unix_daemon.pl is the most preferable solution. If that is not possible, then I think a simple predicate that just means "serve all arriving HTTP(S) clients" would be useful, because it would allow us to say:

$ swipl -f server.pl -g 'https_server(443),serve_all_clients' &

and that would prevent SWI-Prolog from asking for more input and thus avoid the SIGHUP if the process is detached.

html_quasiquotations not parsing valid HTML method="POST" in forms not allowed

:- use_module(library(http/html_write)).
:- use_module(library(http/http_dispatch)).
:- use_module(library(http/http_parameters)).
:- use_module(library(http/http_session)).
:- use_module(library(http/html_quasiquotations)).


pay_with_card(Total) -->
    {
            setting(checkout:active_stripe_key, Key)
        },
    html( {|html(Total)||
<form action="/charge" method="POST">
  <script
    src="https://checkout.stripe.com/checkout.js" class="stripe-button"
    data-key="pk_test_2NwXWNYn4X732fdOEumVlz57"
    data-image="/square-image.png"
    data-name="Demo Site"
    data-description="2 widgets ($20.00)"
    data-amount="2000">
  </script>
</form>
         |}).

This example HTML is lifted straight from stripe.com's documentation.

On my machine
Welcome to SWI-Prolog (Multi-threaded, 64 bits, Version 7.1.29-5-g0a3d88a)

this won't compile, complaining about unexpected value, found POST
removing

 method="POST"

allows compilation

Adding structure around the HTML above passes the W3C's html5 validator

<!DOCTYPE html>
<html>
<head><title>foo</title></head>
<body>
 <form action="/charge" method="POST">
  <script
    src="https://checkout.stripe.com/checkout.js" class="stripe-button"
    data-key="pk_test_2NwXWNYn4X732fdOEumVlz57"
    data-image="/square-image.png"
    data-name="Demo Site"
    data-description="2 widgets ($20.00)"
    data-amount="2000">
  </script>
</form>

</body>
</html>

http_reply works differently via http than shell

Assume following code

test :-
    http_reply(moved('http://a.com'), current_output, [], [], _).

When calling test from shell, http_reply succeeds returning

HTTP/1.1 301 Moved Permanently
Date: Tue, 27 Oct 2015 08:52:25 GMT
Location: http://a.com
Content-Length: 370
Content-Type: text/html; charset=UTF-8

<!DOCTYPE html>
<html>
<head>
<title>301 Moved Permanently</title>

<meta http-equiv="content-type" content="text/html; charset=UTF-8">

</head>
<body>

<h1>Moved Permanently</h1>

<p>
The document has moved <a href="http://a.com"> Here</a></p>

<address>
<a href="http://www.swi-prolog.org">SWI-Prolog</a> httpd at honnixs-macbook-pro.local
</address>

</body>
</html>

However when access test via http (browser or curl), following error can be observed:

Internal server error

Illegal HTTP parameter: HTTP/1.1 301 Moved Permanently
In:
[30] format(current_output,'~s',[...])
[29] http_header:status_reply('<garbage_collected>',current_output,[],[],'<garbage_collected>') at /usr/local/Cellar/swi-prolog/7.2.3/libexec/lib/swipl-7.2.3/library/http/http_header.pl:329
[28] setup_call_catcher_cleanup(http_header: ...,http_header: ...,_G2190,http_header: ...) at /usr/local/Cellar/swi-prolog/7.2.3/libexec/lib/swipl-7.2.3/boot/init.pl:310
[26] http_header:http_status_reply('<garbage_collected>',current_output,[],[],'<garbage_collected>') at /usr/local/Cellar/swi-prolog/7.2.3/libexec/lib/swipl-7.2.3/library/http/http_header.pl:299
...

The error happens when calling format and current_output is http stream.

parse_url/2 and uri_components/2 sometimes mistake host for protocol

For example, localhost and 127.0.0.1 can be used interchangeably in many cases:

?- parse_url('localhost', Ps).
Ps = [protocol(http), host(localhost), path(/)].

?- parse_url('127.0.0.1', Ps).
Ps = [protocol(http), host('127.0.0.1'), path(/)].

?- uri_components('localhost', Cs).
Cs = uri_components(_G3573, _G3574, localhost, _G3576, _G3577).

?- uri_components('127.0.0.1', Cs).
Cs = uri_components(_G3573, _G3574, '127.0.0.1', _G3576, _G3577).

In each case, it is clear that localhost and 127.0.0.1 are the host, and the predicates behave exactly as expected.

However, when a port is specified, we see for example the following difference:

?- parse_url('127.0.0.1:4041', Ps).
Ps = [protocol(http), host('127.0.0.1'), port(4041), path(/)].
?- parse_url('localhost:4041', Ps).
Ps = [protocol(localhost), path('4041')].

My expectation would be for parse_url/2 to assume http as the default protocol in both cases, and again consider 127.0.0.1 and localhost as the host in both cases.

Likewise, consider:

?- uri_components('127.0.0.1:4041', Cs).
Cs = uri_components('127.0.0.1', _G2659, '4041', _G2661, _G2662).

?- uri_components('localhost:4041', Cs).
Cs = uri_components(localhost, _G2659, '4041', _G2661, _G2662).

So, uri_components/2 is consistently (and somewhat unexpectedly) regarding the host as the protocol or "Scheme" when a port is specified, yet differs from the way parse_url/2 parses both URLs.

Running Unix daemon without --interactive ignores other source files

Take for example Proloxy:

When I run

$ swipl -l config.pl -f proloxy.pl -- --port=3030  --interactive

then, as expected, the Prolog rules contained in config.pl are available to the program, and HTTP dispatching works exactly as intended.

However, when I drop the --interactive switch and use:

$ swipl -l config.pl -f proloxy.pl -- --port=3030

then I get, when accessing the web server on port 3030:

Undefined procedure: request_prefix_target/3

It thus seems that the additional source file (config.pl) is only loaded when I use the --interactive switch.

The use case for this is clear: I want to make additional rules easily available by specifying further source files on the command line. For consistency and to benefit from this feature, I hope they can be loaded as specified also when omitting the --interactive switch.

Bug in ws_header for 32 bit machines and payload >= 65536

In line 763 we have

for(i=0; i<8; i++)
  hdr[2+i] = (payload_len >> ((7-i)*8)) & 0xff;

but payload_len has type size_t (i.e. 32 bits on linux 32 bits machines), then
leading to undefined behaviour for shift (and websocket protocol issues).

An easy fix is to change type of payload_len to uint64_t.

WebSocket handshake fails in Chrome due to missing Sec-WebSocket-Protocol field

Steps to reproduce:

  1. Start the echo server that is shown in the documentation of library(http/websocket), using:

    ?- http_server(http_dispatch, [port(8090)]).
    
  2. Use the following JavaScript line to establish a WebSocket connection in Chrome:

    var s = new WebSocket("ws://localhost:8090/ws", "protocolOne");
    

Chrome refuses the connection with:

Error during WebSocket handshake: Sent non-empty 'Sec-WebSocket-Protocol' header but no response was received

If possible, please add this header field to the response. Thank you!

Better approach towards throwing HTTP errors

Better approach towards throwing HTTP errors. For instance, ClioPatria currently returns the 500 status code for (1) unanticipated but valid request methods (e.g., OPTIONS) as well as for (2) unanticipated and malformed requests.

Jan said: "User code can succeed (ok) or throw a http-aware exception (nicely mapped). That is all fine. It may also fail or throw any other exception. This can mean anything. Sometimes it means the handler didn't check the request properly and fails further down the line. In most cases it means the handler is broken."

Maybe this can be solved by implementing different layers for processing a request, .e.g,:

  • Layer 1: Check for request properties that are not supported by the HTTP library (e.g., certain HTTP methods) and return 501 (Not Implemented).
  • Layer 2; Check for request properties that are supported by the HTTP library but not implemented by a given HTTP handler (e.g., the handler is only defined for GET but a request has method PUT) and return 405 (Method Not Allowed).
  • Layer 3: Check whether the Content-Type (if any) is as expected and return 415 (Unsupported Media Type) otherwise.
  • Layer 4: Interpret the request's contents according to the Content-Type, .e.g, parse a SPARQL request (application/sparql-query). If the parser fails return 400 (Bad Request).
  • Layer 5: Process the content parsed in layer 4. If something goes wrong while processing return 500 (Internal Server Error).
  • Layer 6: Check whether the request is authenticated to perform the given method on the specified resource, return 401 (Unauthorized) if this is not the case.
  • Layer 7; The processed result of layer 5 is serialized in a response format. If the Accept header of the request did not match any of the serialization options return 406 (Not Acceptable).

The above may be implemented by extending http_handler/3 with (1) the HTTP method, (2) Content-Type (input format to the server), (3) Accept (output format from the server), (4) authentication requirements.

Jan said: "The proper way out is probably to extend the http_handler/3 registration. That also might allow us to reply to OPTIONS and HEAD in a generic way. Probably something like adding methods(ListOfMethods) where the default is methods([get])."

Many of these layers are present in many real-world requests. The defaults should make sense in a large set of cases and should be easy to change by the programmer.

Configuration file for HTTP Unix daemon

In this issue, I would like to:

  1. collect everything that has so far been said about configuration files in the context of the HTTP Unix daemon.
  2. add new information that has since arisen due to new features of library(ssl)
  3. find an elegant, flexible and convenient way to run HTTPS servers with SWI-Prolog.

The (perceived) need for a configuration file was first raised and supported in #54, with Wouter and me in favour of the idea. It was dismissed again because for that concrete issue, a more trivial fix was applied. Now, a few months later and with the availability of new features for HTTPS servers, this is coming up again in #76.

First, I would like to give a short summary of what a typical HTTPS server needs as its configuration. It is evident that in the very near future, HTTPS servers will become the "normal" web servers, since plain HTTP servers will be marked as insecure by all major browsers. Therefore, it is wise to consider how to best accommodate their configuration.

Currently, a typical SWI-Prolog HTTPS server is started by supplying the following command line options to the Unix daemon:

  • --https (mandatory)
  • --keyfile=File (almost all servers will use this, even though it could theoretically be omitted)
  • --certfile=File (again, used by almost all servers).

A significant portions of these servers will also use:

  • --redirect to redirect HTTP to HTTPS.
  • --cipherlist=Atom to set ciphers with (currently) acceptable security.

Now, with recent changes to library(ssl), the following configurations must also be considered:

  • SNI, to supply different certificates depending on the host name of the client request
  • multiple certificate to support different ciphers, for example dual-stack RSA/ECDSA servers.

SNI is already supported in the Unix daemon by the hook http:sni_options/2. Multiple certificates are not yet possible. However, it is clear that this use case will become increasingly relevant in the future, and so it should also be possible to supply multiple certificates somehow.

Currently, a typical invocation of an HTTPS server could like like this:

swipl rules.pl proloxy.pl \
   --no-fork --user=www --https --certfile=server.crt \
   --keyfile=server.key \
   --pidfile=/var/run/https_proloxy.pid --workers=32 --syslog=https_proloxy \
   --cipherlist=EECDH+AESGCM:EDH+AESGCM:EECDH+AES256:EDH+AES256:CHACHA20 \
   --keep_alive_timeout=2

I have highlighted options that are specific to HTTPS.

It is evident that such an invocation is extremely inconvenient. Moreover, the following problem arises: If you put all this for example in a .service file for systemd, then you need to do sudo systemctl daemon-reload every time you change the configuration parameters. This is also inconvenient, and there is no way around this.

The point has been raised that the Unix daemon is becoming too complex altogether, providing features that go far beyond its initial conception of a "Unix daemon". To be honest, my impression is almost the opposite: In less than 1000 lines of code (including extensive documentation), the daemon manages to expose an impressive amount of very flexible functionality, far more than I initially thought would be possible by so little glue code. If this is now deemed too complex, I wonder what to think about certain other areas of the code that ships with SWI-Prolog. To me, it is a testament to the elegant architecture of the web framework that the Unix daemon takes comparatively little and quite straight-forward code.

We are almost at the point that all of the above is supported. However, multiple certificates remain pending in #76, and this has now provided the impetus to look more closely at the best approach to include this feature too.

In general, we actually need triples of the form:

  • certificate
  • key
  • password.

Using passwords to protect private keys is certainly advisable, but frequently simply not done in practice, also not by Let's Encrypt: In this case and many others, suitable file permissions are used to severely restrict access to private keys and even certificates. Still, we support also encrypted private keys by means of the --pwfile option (which is to be preferred over --password, since command line arguments can be easily inspected by other processes).

So, we could support multiple certificates by using for example multiple occurrences of --certfile etc. However, this gets increasingly complex and unwieldy to use (the implementation remains rather straight-forward, and does not significantly increase the complexity of the Unix daemon).

Now the main point: I still think that a configuration file would be a great idea to solve all this. The main reason for this is the following:

  • HTTPS servers will become increasingly common.
  • Let's Encrypt provides free certificates that are easy to set up and use.
  • SWI-Prolog is increasingly used for web applications.
  • That's a great chance to advertise Prolog as a configuration language.

The Prolog community missed the boat when it came to choosing a syntax for RDF and many other web standards, where Prolog syntax could have played a very important role. Why miss the next boat too? Wouldn't it be nice to have Prolog based configuration files as follows, say config.pl:

https.

certfile('server.crt').
keyfile('server.key').

cipherlist('EECDH+AESGCM:EDH+AESGCM:EECDH+AES256:EDH+AES256:CHACHA20').

user(www).

keep_alive_timeout(2).
workers(32).

And then start the daemon with:

swipl rules.pl proloxy.pl --config=config.pl --no-fork  

Further, SNI could be accommodated by providing http:sni_options/2 in the same file.

Moreover, multiple certfile/1 and keyfile/1 options could appear.

Of course, the following syntax would make even more sense:

server([keyfile='server.key',
        certfile='server.crt',
        https=true,
        cipherlist='EECDH+AESGCM:EDH+AESGCM:EECDH+AES256:EDH+AES256:CHACHA20',
        keep_alive_timeout=2,
        workes=32]).

I think adding this feature to the HTTP Unix daemon would allow a very flexible and convenient configuration of (multiple) HTTPS servers and at the same time advertise Prolog as a nice language for configuration files too.

ws_receive/3 with format(json) sometimes does not yield JSON

To reproduce with SWI-Prolog 7.5.1, please run the following query:

?- use_module(library(http/websocket)).
true.

Followed by:

?- http_open_websocket('wss://ws.blockchain.info/inv', WS, []),
   ws_send(WS, json(_{op: "unconfirmed_sub"})),
   length(_, L),
      portray_clause(L),
      ws_receive(WS, Msg, [format(json)]),
      \+ is_dict(Msg.data).

The output from this may vary between runs, but inevitably starts and ends like this:

0.
1.
2.
3.
4.
5.
...
53.
54.
WS = (0x7f801a8270d0,0x7f801a825620),
L = 54,
Msg = websocket{data:"{\n  \"op\" : \"utx\",\n  \"x\" : {\n    \"lock_time\" : 0,\n    \"ver\" : 1,\n    \"size\" : 226,\n    \"inputs\" : [ {\n      \"sequence\" : 4294967295,\n      \"prev_out\" : {\n        \"spent\" : true,\n        \"tx_index\" : 234701988,\n        \"type\" : 0,\n        \"addr\" : \"14Wa44aAqRGpcgv1dgCU1fRHvrqW9eMZ4m\",\n        \"value\" : 25000000,\n        \"n\" : 0,\n        \"script\" : \"76a914268008048d14e5e40f1cd819a3c11add1e6c408588ac\"\n      },\n      \"script\" : \"483045022100d48db67257037add65f06729de36e15e79994d69e2376b8cb1e1ae378fc67dde0220094681ac5415d359e87b8bdbd2ea188dd508a6c76553420d7f11601a9d3861e701210255b7e52913a13df6cdbe16ba8e3ed88a90511c25ac98a84ebdca00d11ab6c41a\"\n    } ],\n    \"time\" : 1490116499,\n    \"tx_index\" : 234709729,\n    \"vin_sz\" : 1,\n    \"hash\" : \"8576815930dcea2c67c8b1e8bdc0fab59f947605ffe6bc9163c35d699b3ede2a\",\n    \"vout_sz\" : 2,\n    \"relayed_by\" : \"217.111.66.79\",\n    \"out\" : [ {\n      \"spent\" : false,\n      \"tx_index\" : 234709729,\n      \"type\" : 0,\n      \"addr\" : \"12BLZjSiPxNcU6dfJNjrvT3YyJcRBBJL76\",\n      \"value\" : 12000000,\n      \"n\" : 0,\n      \"script\" : \"76a9140cec9dec93f2faa73fe1ed83329a1c5d080c5f9c88ac\"\n    }, {\n      \"spent\" : false,\n      \"tx_index\" : 234709729,\n      \"type\" : 0,\n      \"addr\" : \"15n13xT3vobHAWkm8hVuJrRby6qqqjRSK6\",\n      \"value\" : 12950300,\n      \"n\" : 1,\n      \"script\" : \"76a91434634392de4cbee99ff98d641ef8fcbfb721a57a88ac\"\n    } ]\n  }\n}", format:string, opcode:text} 

This shows that ws_receive/3 sometimes does not yield a JSON dict, even if format(json) is specified.

HEAD method must not return a message-body in the response

Quoting from W3.org:

The HEAD method is identical to GET except that the server MUST NOT return a message-body in
the response. The metainformation contained in the HTTP headers in response to a HEAD request
SHOULD be identical to the information sent in response to a GET request. This method can be used
for obtaining metainformation about the entity implied by the request without transferring the
entity-body itself. This method is often used for testing hypertext links for validity, accessibility, and
recent modification.

Currently, when I telnet to an SWI-Prolog HTTP server and ask:

HEAD /

then I get a message-body in addition to the requested head. This noncomformant response is very undesirable for several reasons. If possible, please avoid replying the message body when responding to HEAD requests. Thank you!

Please make http_open/3 autoloaded

Please consider making http_open/3 an autoloaded predicate.

For example, load_html/3 is also autoloaded, and it would be much more useful if we could readily provide an HTTP stream to it.

Thank you!

http_post/4: codes(Type, Codes) option sometimes unexpectedly emits charset twice

Suppose I relay a POST request, and I want to use the exact same content_type as in the original request. I do:

...,
memberchk(content_type(Type), Request),
http_post(Target, codes(Type,Codes), _, [])

Unfortunately, this yields unintended results for example with pengines, because pengines sends:

Content-Type: application/json; charset=utf-8

and http_post/4 appends its own ; charset=UTF-8 to that, yielding the HTTP header field:

Content-Type: application/json; charset=utf-8; charset=UTF-8

and pengines itself cannot handle this unexpected duplicated parameter when it is sent.

If possible, please avoid appending ; charset=UTF=8 if the charset is already specified in the content type. Thank you!

cipherlist does not work in systemd

To reproduce this issue, please use the latest git version of the SSL package, including pull request 24 which enables the SSL options cipher_list/1 and ecdh_curve/1.

In the following, I show that setting the SSL cipher list with the new option unexpectedly does not work with systemd, even though it works on the regular command line. I am using Debian 8.1 to reproduce this issue.

First, let ~/daemon.pl consist of:

:- module(daemon, []).

:- use_module(library(http/thread_httpd)).
:- use_module(library(http/http_dispatch)).
:- use_module(library(http/http_unix_daemon)).

:- initialization http_daemon.

:- http_handler(/, handle_request, [prefix]).

handle_request(Request) :- http_404([], Request).

This is a very simple HTTP Unix daemon that suffices to show the problem. (It returns 404 for every request.)

Further, in case it is still necessary at the time you try it, please apply the patch shown in pull request 50, which enables the option --cipherlist for the HTTP Unix daemon.

Throughout the following snippets, please replace my user name (shown in bold) with your own user name everywhere it occurs.

Using suitable server.crt and server.key files (which, for simplicity, I assume are located in ~/), you can start the server with:

~ $ sudo /usr/local/bin/swipl daemon.pl --no-fork --user=triska --https --certfile=server.crt --keyfile=server.key --cipherlist='DEFAULT:!RC4'
% Started server at https://localhost:443/
% Started server at port 443

You can use sslscan to see that the --cipherlist option works as expected:

$ sslscan localhost:443 | grep -i acc
    Accepted  TLSv1  256 bits  ECDHE-RSA-AES256-SHA
    Accepted  TLSv1  256 bits  AES256-SHA
    Accepted  TLSv1  256 bits  CAMELLIA256-SHA
    Accepted  TLSv1  128 bits  ECDHE-RSA-AES128-SHA
    Accepted  TLSv1  128 bits  AES128-SHA
    Accepted  TLSv1  128 bits  SEED-SHA
    Accepted  TLSv1  128 bits  CAMELLIA128-SHA
    Accepted  TLSv1  112 bits  ECDHE-RSA-DES-CBC3-SHA
    Accepted  TLSv1  112 bits  DES-CBC3-SHA
    Accepted  TLSv1  56 bits   DES-CBC-SHA

For example, no RC4 ciphers are present in this case, exactly as intended.

And now the surprising problem: When I invoke the server with the exact same commands via a systemd service file, say /etc/systemd/system/daemon.service:

[Unit]
Description=HTTPS daemon 

[Service]
UMask=022
Environment=LANG=en_US.utf8
Restart=on-abort
StartLimitInterval=60
LimitNOFILE=1000
DefaultStartLimitBurst=5
WorkingDirectory=/home/triska
ExecStart=/usr/local/bin/swipl daemon.pl --port=3038 \
   --no-fork --user=triska --https --certfile=server.crt \
   --keyfile=server.key \
   --cipherlist='DEFAULT:!RC4'

[Install]
WantedBy=multi-user.target

Then I get, after enabling and starting the service:

$ sudo systemctl enable /etc/systemd/system/daemon.service
$ sudo systemctl start daemon.service
$ sudo systemctl status daemon.service

the status information:

daemon.service - HTTPS daemon
   Loaded: loaded (/etc/systemd/system/daemon.service; enabled)
   Active: failed (Result: exit-code) since Wed 2016-05-04 00:25:54 CEST; 2s ago
  Process: 28125 ExecStart=/usr/local/bin/swipl daemon.pl --port=3038 --no-fork --user=triska --https --certfile=server.crt --keyfile=server.key --cipherlist='DEFAULT:!RC4' (code=exited, status=1/FAILURE)
 Main PID: 28125 (code=exited, status=1/FAILURE)

May 04 00:25:54 myhost systemd[1]: Starting HTTPS daemon...
May 04 00:25:54 myhost systemd[1]: Started HTTPS daemon.
May 04 00:25:54 myhost swipl[28125]: ERROR: SSL(140E6118) SSL_CIPHER_PROCESS_RULESTR: invalid command

I have no idea how this discrepancy can arise.

@chigley, if you have time, could you please look into this? It may be a problem with the way we initialize SSL from within the HTTP Unix daemon. Thank you in advance!

Assertion error when trying to read JSON from HTTP stream

The following call gives an assertion error when trying to read the input as JSON:

?- [library(http/http_open)].
?- [library(http/json)].
?- http_open('http://datos.santander.es/api/action/group_list.json?offset=0&limit=100', In, [request_header(accept='application/json')]), json_read_dict(In, Dict).

This is the error:

ERROR: Assertion failed: json:is_list(@end_of_file)
  [14] backtrace(10) at /lhome/wbeek/lib/swipl/library/prolog_stack.pl:444
  [13] prolog_debug:assertion_failed(fail,json:is_list(...)) at /lhome/wbeek/lib/swipl/library/debug.pl:326
  [12] prolog_debug:assertion(json:is_list(...)) at /lhome/wbeek/lib/swipl/library/debug.pl:314
  [11] json:term_to_dict(@end_of_file,_248,json_options(null,true,false,string,'')) at /lhome/wbeek/lib/swipl/library/http/json.pl:957
  [10] json:json_read_dict(<stream>(0x23bf130),_298,[]) at /lhome/wbeek/lib/swipl/library/http/json.pl:940
   [9] json:json_read_dict(<stream>(0x23bf130),_332) at /lhome/wbeek/lib/swipl/library/http/json.pl:932
   [8] '<meta-call>'(user:(...,...)) <foreign>
   [7] <user>

"No permission to fork process" during server startup

On Fedora 26, with the following content in test.pl:

:- use_module(library(http/http_unix_daemon)).
:- initialization http_daemon.

... I get an error that I have not seen earlier:

$ swipl test.pl --port=3001
ERROR: No permission to fork process `main' (running_threads([pce]))

The error goes away by adding --interactive. I do not get this error on Ubuntu.

Feature Request: Server Status Page

Analogous to library(http/http_log), please consider providing library(http/http_status).

This library could gather and display status information about the running server.

I would like to include it simply via:

:- use_module(library(http/http_status)).

:- http_handler('/my-server-status', http_status, []).

possibly with additional security options.

Then, when I visit /my-server-status, it would display, much like mod_status:

  • number of workers serving requests
  • number of idle worker
  • status of each worker, the number of requests that worker has performed and the total number of bytes served by the worker
  • total number of accesses and byte count served
  • time the server was started/restarted and the time it has been running for
  • averages giving the number of requests per second, the number of bytes served per second and the average number of bytes per request
  • current percentage CPU used by each worker and in total by all workers combined
  • current hosts and requests being processed
  • ...

Issues like SWI-Prolog/packages-ssl#25 will become much easier to track down with such a provision.

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.