Code Monkey home page Code Monkey logo

irclog2html's Introduction

irclog2html

Converts IRC log files to HTML with pretty colours.

Quick start

Installation:

pip install irclog2html

Quick usage for a single log file:

irclog2html --help
irclog2html filename.log                  (produces filename.log.html)

Mass-conversion of logs (one file per day, with YYYY-MM-DD in the filename) with next/prev links, with mtime checks, usable from cron:

logs2html directory/     (looks for *.log and *.log.gz, produces *.log.html)

Configuration files

Since you probably don't want to keep specifying the same options on the command line every time you run logs2html, you can create a config file. For example:

-t 'IRC logs for #mychannel'
-p 'IRC logs for #mychannel for '
# the following needs some extra Apache setup to enable the CGI/WSGI script
--searchbox
# where we keep the logs
/full/path/to/directory/

Use it like this:

logs2html -c /path/to/mychannel.conf

Lines starting with a # are ignored. Other lines are interpreted as command-line options.

The order matters: options on the command line before the -c FILE will be overriden by option in the config file. Options specified after -c FILE will override the options in the config file.

You can include more than one config file by repeating -c FILE. You can include config files from other config files. You can even create loops of config files and then watch and laugh manically as logs2html sits there burning your CPU.

CGI script for log searching

Warning

The script can be easily abused to cause a denial of service attack; it parses all log files every time you perform a search.

You can generate search boxes on IRC log pages by passing the --searchbox option to logs2html. Here's an example Apache config snippet that makes it work:

RewriteRule ^/my-irclog/search/$ /my-irclog/search [R,L]
ScriptAlias /my-irclog/search /usr/local/bin/irclogsearch
<Location /my-irclog/search>
  SetEnv IRCLOG_LOCATION "/var/www/my-irclog/"
  # Uncomment the following if your log files use a different format
  #SetEnv IRCLOG_GLOB "*.log.????-??-??"
  # (this will also automatically handle *.log.????-??-??.gz)
</Location>

WSGI script for log serving

Warning

The script can be easily abused to cause a denial of service attack; it parses all log files every time you perform a search.

There's now an experimental WSGI script that can generate HTML for the logs on the fly. You can use it if you don't like cron scripts and CGI.

Here's an example Apache config snippet:

WSGIScriptAlias /irclogs /usr/local/bin/irclogserver
<Location /irclogs>
  SetEnv IRCLOG_LOCATION "/var/www/my-irclog/"
  # Uncomment the following if your log files use a different format
  #SetEnv IRCLOG_GLOB "*.log.????-??-??"
  # (this will also automatically handle *.log.????-??-??.gz)
</Location>

Currently it has certain downsides:

  • configuration is very limited, e.g you cannot specify titles or styles or enable dircproxy mode
  • HTML files in the irc log directory will take precedence over dynamically-generated logs even if they're older than the corresponding log file (but on the plus side you can use that to have dynamic search via WSGI, but keep statically-generated HTML files with your own config tweaks)

WSGI script for multi-channel log serving

Warning

The script can be easily abused to cause a denial of service attack; it parses all log files every time you perform a search.

The experimental WSGI script can serve logs for multiple channels:

WSGIScriptAlias /irclogs /usr/local/bin/irclogserver
<Location /irclogs>
  SetEnv IRCLOG_CHAN_DIR "/var/www/my-irclog/"
  # Uncomment the following if your log files use a different format
  #SetEnv IRCLOG_GLOB "*.log.????-??-??"
  # (this will also automatically handle *.log.????-??-??.gz)
</Location>

Now /irclogs will show a list of channels (subdirectories under /var/www/my-irclog/), and /irclogs/channel/ will show the date index for that channel.

Misc

Website: https://mg.pov.lt/irclog2html/

Bug tracker: https://github.com/mgedmin/irclog2html/issues

Licence: GPL v2 or v3 (https://www.gnu.org/copyleft/gpl.html)

buildstatus appveyor coverage

irclog2html's People

Contributors

alga avatar arsenarsen avatar cedk avatar mapreri avatar mgedmin avatar moises-silva avatar p0358 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

irclog2html's Issues

Option to show copyright status of the logs themselves

Svetlana A. Tkachenko requested this via email:

Would you please write a piece of code to ask people who use your software to specify the licence they would like to release the irc logs under? Otherwise it is an implied 'all rights reserved' which is bad. I am sure many of them are copyleft resources (allowing redistribution and modification of the logs) but forget to specify that.

I think the best solution would be to have a proper configuration file where I could specify arbitrary footer text.

Dynamic HTML rendering

There's now a WSGI app that can do searches and serve files. Why not make it convert the log files to HTML on the fly?

Pros:

  • simpler deployments (no need to muck with cron)
  • no update lag

Cons:

  • slower page rendering

There's a prototype of this in the dynamic-html branch. It needs work:

  • a config file for specifying tweaks (custom colors, title format, output style, etc.)
  • next/prev/index links
  • dynamic index page generation
  • caching? maybe on disk? i.e. write the html files right there; load them if mtime says they're up to date?
  • I would like to ship an irclogserver script with command-line options, instead of the current minimal bin/serve.

Use date in anchors where available

I am generating a page for a log-file spanning more than a single day, and including date information on every log line. irclog2html still seems to produce an anchor containing only the time, which causes collisions (and therefore means linking to a specific line is difficult).

This could be solved by including the date in the anchor.

When using --output-dir, should not check for html files in source directory

Consider source directory with some log files and matching html files generated by running logs2html in place:

cd /tmp
mkdir source
cd source
touch test.2021-07-0{1..3}.log
logs2html .

Now run the script with a target directory:

cd /tmp
logs2html --output-dir=target source

⚠️ Problem: The target dir does not contain any html files

$ ls -1 target
index.html
irclog.css
latest.log.html

Now delete one of the html files in source and repeat

rm source/test.2021-07-03.log.html
logs2html --output-dir=target source

Notice some files have been generated

$ ls -1 target/
index.html
irclog.css
latest.log.html
test.2021-07-02.log.html
test.2021-07-03.log.html

Update a log file in source

touch source/test.2021-07-01.log.html
logs2html --output-dir=target source

Modified file is generated too

$ ls -1 target/
index.html
irclog.css
latest.log.html
test.2021-07-01.log.html
test.2021-07-02.log.html
test.2021-07-03.log.html

Expected behavior, would be to ignore any html files in source directory, and check the ones in target directory instead.

license clarification

Hi there :)

On pypi it's written:

    Licence: GPL v2 or later (http://www.gnu.org/copyleft/gpl.html)

But nowhere on the code I can see references neither to the "v2" nor to the "or later" part, which means that just according to the code the license would be GPL-1 which is, erm, let's just say that I doubt you want it :)
Also the COPYING file reports the GPL-2 text.

Given that I think this is just a oversee, could you please specify on the various file headers the correct license under which this work is done? (i.e. GPL-2+ or GPL v2 or later, whatever unambiguous form you prefer)
While on it, be aware that also your copyright claim doesn't cover 2016, whilst you did some work on it.

For completeness, I want to say that what trigger this search of mine is my willingness to put this software in the Debian archive (just because 1) I'm using it since years 2) I'm a Debian Developer 3) I prefer installing software from the Debian archive instead of pypi or git checkouts).

UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 8: ordinal not in range(128)

I've an UWSGI-based irclog2html deployment that fails to render the index page with the following error:

Traceback (most recent call last):
  File "/opt/irclog2html/local/lib/python2.7/site-packages/irclog2html/irclogserver.py", line 213, in application
    dir_listing(stream, chan_path)
  File "/opt/irclog2html/local/lib/python2.7/site-packages/irclog2html/irclogserver.py", line 116, in dir_listing
    % (quote_plus(channel.name), escape(channel.name)),
  File "/opt/irclog2html/local/lib/python2.7/site-packages/irclog2html/irclog2html.py", line 354, in escape
    s = s.replace('&', '&amp;').replace('<', '&lt;').replace('>', '&gt;').replace('"', '&quot;')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 8: ordinal not in range(128)

This is on Python 2.7.

My irclogs tree has a directory named '#gnome-'$'\020''�s��'$'\177', which is a result of a terrible irssi accident, I expect.

Question: Using with data other than files

So I am writing a script that is storing IRC messages in a database. Is there a way to utilize this so that when I render the code into a website (it's Django) I could have it add the html stuff using this?

Thanks!

Perform search on the client side

Hi,

I don't use irclog2html (yet), but I've just read that you perform log searchs via a CGI script and that can cause DoS.
I suggest that you do that search with Javascript instead. This has these two advantages:

  • no more DoS
  • no need for setting up the CGI

As an example/proof-of-concept you can see how Sphinx (the doc generator behind docs.python.org), which compiles into 100% static HTML/JS and provides search via Javascript code (which loads a JSON file from the server).

Regards,
Valentin

2015/2016 release?

I see quite a few commits in 2015 but last release was in 2014.

Since you're recommending installing with pip, would a release be appropriate at this time?

Automatically handle gzipped files

I was actually surprised that passing in filename.log.gz resulted in a filename.log.gz.html that contained nothing but gibberish (the binary gzip data interpreted as text) between the header and footer. Other log processing tools (e.g. pisg) automatically handle gzipped files. It would be super-convenient if irclog2html did so as well.

Tell me the complete configuration

Hi,
I want to make my chat room logs file into html, and I am new to root,
so kindly tell me complete configuration in details..
thanks.

`AttributeError: readable` in irclogsearch 2.12.0 on Python 2.7

Traceback ends with:

self.outfile = io.TextIOWrapper(outfile, encoding=self.charset,
                                errors='xmlcharrefreplace',
                                line_buffering=True)

where outfile is <open file '<stdout>', mode 'w'>.

This error doesn't occur if you test form the command line with IRCLOG_LOCATION=testcases QUERY_STRING=q=a bin/irclogsearch on Python 3.3.

parts of log message missing

[14 Jan 12:37] <[email protected]> I'm going from A -> B

results in

<tr id="t14 Jan 12:37"><th class="nick" style="background: #b15db1">jsmith</th><td class="text" style="color: #b15db1">B</td><td class="time"><a href="#t14 Jan 12:37" class="time">14 Jan 12:37</a></td></tr>

i.e only "B" is shown in the message

one can reduce the test case to [00:00:00] <!> -> X

Anchors with low-res timestamps

When there are two (or more) messages with the same timestamp, it's impossible to link directly to the 2nd one.

See #15 for a possible fix.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.