helgeho Goto Github PK

followers: 36.0 following: 3.0 repos: 27.0 gists: 0.0

Name: Helge Holzmann

Type: User

Company: Internet Archive

Bio: Software developer, researcher and consultant with a PhD in Computer Science, Web Data Engineer at Internet Archive, working on better access to web archives.

Location: Hannover, Germany

Blog: http://www.HelgeHolzmann.de

Helge Holzmann's Projects

archivepig

An Apache Pig framework that facilitates access to Web Archives, enables easy data extraction as well as derivation.

archivespark

An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.

archivespark-aut-bridge

The compatibility layer between ArchiveSpark and The Archives Unleashed Toolkit (AUT)

archivespark-docker

ArchiveSpark on Docker

archivespark-server

A server application that provides a Web service API for ArchiveSpark to be used by third-party applications to integrate temporal Web archive data with a flexible, easy-to-use interface.

archivespark-zeppelin-docker

ArchiveSpark with Zeppelin as ready-to-use Docker image

archivespark2triples

Convert web archives to RDF triples with ArchiveSpark

aut

The Archives Unleashed Toolkit is an open-source platform for analyzing web archives.

awesome-web-archiving

An Awesome List for getting started with web archiving

engtagger

English Part-of-Speech Tagger Library; a Ruby port of Lingua::EN::Tagger

exspec

Don't write specs anymore, just save 'em while testing your code interactively. Specs will become a byproduct.

fel4archivespark

Yahoo's Fast Entity Linker for ArchiveSpark

hadoopconcatgz

A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz

hadoopwebgraph

A Hadoop input format to use gaphs in WebGraph's BV format with Hadoop and Spark.

helgeho.github.io

My personal website

iabooksonarchivespark

Analyze digitized books from the Internet Archive remotely with ArchiveSpark

internetarchive-transfer-scripts

Scripts to transfer archive.org collections, using https://github.com/jjjake/internetarchive

mapreducelecture

A lecture on MapReduce with example code

mhlonarchivespark

Work with Medical Heritage Library collections using ArchiveSpark

micrawler

Create and cite micro Web archives with semantics as temporal representations of objects / entities / concepts on the Web

php-cross-domain-proxy

PHP Proxy for Cross Domain Requests

ruby-jobs

A simple way to run jobs, wether simple scripts or experiments for your research. You can log results and progress without a hassle and experiment with different configurations.

spark

Mirror of Apache Spark

tempas2archivespark

ArchiveSpark DataSpec to analyze the Internet Archive's Web archive through temporal search results returned by Tempas (v2)

warcpartitioner

Partition (W)ARC Files by MIME Type and Year

web2warc

An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)

zstd-jni

JNI binding for Zstd

helgeho Goto Github PK

Helge Holzmann's Projects

Recommend Projects

Recommend Topics

Recommend Org