danielbicho Goto Github PK
Name: Daniel Bicho
Type: User
Name: Daniel Bicho
Type: User
brozzler - distributed browser-based web crawler
Artificial Intelligent. Programming Autonomous Agents.
Script to fix length attributes in serialized strings
Recursive tests developed with selenium framework for Arquivo.pt
Mirror of Apache Hadoop common
Distributed crawler powered by Headless Chrome
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
🇵🇹🎬 Timelapser for Arquivo.pt
Repository to host my daily 'orphan' notebooks and code, that do not fit at any repository project, so they need a home
Caffe Squeezenet model for binary classification of pornographic/non-pornographic material
code for running Model and code for Not Suitable for Work (NSFW) classification using deep neural network Caffe models
Scriptable Headless WebKit
FingerPrint Recognition System - Image Processing and Classification
The Portuguese Web Archive (PWA) main goal is the preservation and access of web contents that are no longer available online. During the developing of the PWA IR (information retrieval) system we faced limitations in searching speed, quality of results, scalability and usability. To cope with this, we modified the archive-access project (http://archive-access.sourceforge.net/) to support our web archive IR requirements. Nutchwax, Nutch and Wayback’s code were adapted to meet the requirements. Several optimizations were added, such as simplifications in the way document versions are searched and several bottlenecks were resolved. The PWA search engine is a public service at http://archive.pt and a research platform for web archiving. As it predecessor Nutch, it runs over Hadoop clusters for distributed computing following the map-reduce paradigm. Its major features include fast full-text search, URL search, phrase search, faceted search (date, format, site), and sorting by relevance and date. The PWA search engine is highly scalable and its architecture is flexible enough to enable the deployment of different configurations to respond to the different needs. Currently, it serves an archive collection searchable by full-text with 180 million documents ranging between 1996 and 2010.
Portugues web archive spelling suggestion.
Python WayBack for web archive replay and url-rewriting HTTP/S web proxy
Testing new methods to score web archiving replay quality
WARC writing MITM HTTP/S proxy
Common web archive utility code.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.