jrte / ribose Goto Github PK

Sugar for building and running context-free transducers in Java

License: GNU General Public License v3.0

Shell 1.98% HTML 2.80% Java 95.21%

java transducers context-free-grammar regular-expressions regular-sequences regular-expression-compiler transduction automata regular-patterns science-fiction transducer finite-state-machine finite-state-transducer

ribose's Introduction

TLDR; To skip this long screed and learn how to build and work with ribose jump to the Disclaimer at the end. I built my first transducers with INR, mapping ASCII* into a semiring of C function (effector) pointers at Bell Northern Research in Ottawa, in the late 1980s. Been watching the world stumble by without it ever since, encumbering serialized forms of even the simplest object models with all manners of ill-fitting suits

Why don't information architects use idiomatic Unicode* semiring expressions, tailored expertly to their specific domains and intentions, to describe basic domain artifacts and combine them in more complex forms for persisting and communicating domain-specific information with entities interacting with their domain? Why don't modern programming languages and computing machines present robust support for semirings and automata? Will XML ever go away?

I don't know.

The General Idea

Ribose (formerly jrte) is about inversion of control for high-volume text analysis and information extraction and transformation in general. Many stream-oriented tasks, such as cleaning and extraction for data analytic workflows, involve recognizing and acting upon features embedded, more or less sparsely, within a larger context. Software developers receive some onerous help in that regard from generic software libraries that support common document standards (e.g., XML, JSON, MS Word, etc.), but dependency on these libraries adds complexity, vulnerability and significant runtime costs to software deployments. And these libraries are of no use at all when information is presented in idiomatic formats that require custom software to deserialize.

Ribose specializes ginr, an industrial strength open source compiler for multidimensional regular patterns, to produce finite state transducers (FSTs) that map syntactic features to semantic effector methods expressed by a target class. Ribose transduction patterns are composed and manipulated using algebraic (semiring) operators and compiled to FSTs for runtime deployment. Ginr admits arbitrary bytes (\xHH) in ribose patterns and transcodes Unicode glyphs to UTF-8 byte sequences; ribose compiles FSTs to operate in the byte* domain regardless of UTF-8 or binary origin. Out of band (>255) signals may also be embedded in ribose transducer patterns to guide stream processing in compiled FSTs. Regular patterns may be nested to cover context-free inputs, and the ribose runtime supports unbounded lookahead to resolve ambiguities or deal with context-sensitive inputs. Inputs are presented to ribose runtime transducers as streams of byte-encoded information and regular or context-free inputs are transduced in linear time.

There is quite a lot of byte-encoded information being passed around these days (right-click in any browser window and "View Page Source" to see a sample) and it is past time to think of better ways to process this type of data than crunching it on instruction-driven calculator machines. Ribose and ginr promote a pattern-oriented, data driven approach to designing, developing and processing information workflows. Ribose is a ship-in-a-bottle showpiece put together to shine a spotlight on ginr and to demonstrate what computing might be if finite state transduction, augmented with a transducer stack and coupled with a classical CPU/RAM computer, was a common modality for processing sequential information (i.e., almost everything except arithmetic).

Regular patterns and automata are to computing ecosystems what soil and microbiota are to the stuff living above ground. Strange that we don't see explicit support for their construction and runtime use in modern programming languages and computing machines. Ribose is presented only to demonstrate the general idea of pattern-oriented design and development. As is, it successfully runs a limited suite of test cases and can be used to build domain-specific ribose models, but it is not regularly maintained and not suitable for general use. Others are encouraged to clone and improve it or implement more robust expressions of the general idea. Or be my hero and get clean and simple support for compiling semiring pattern expressions to runnable FSTs on the roadmap for Java or Rust.

The general idea is to show how to make information work for you rather than you having to work to instruct a computer about how to work with information. Or, at least, how to reduce costs associated with information workflows. Ribose views information as the instructional component in streaming contexts, providing a highly workable alternative to WIP This idea outlined below and explored, a bit snarkily, in the stories posted in the ribose wiki. This has no connection whatsoever with POSIX and Perl 'regular expressions' (regex) or 'pattern expression grammars' (PEGs), that are commonly used for ad hoc pattern matching. In the following I refer to the algebraic expressions used to specify ribose transducers as 'regular patterns' to distinguish them from regex and PEG constructs.

Ginr`*`

Ginr is the star of the ribose circus. It was developed by J Howard Johnson at the University of Waterloo in the early 1980s. One of its first applications was to transduce the typesetting code for the Oxford English Dictionary from an archaic layout to SGML. I first used it at Bell Northern Research to implement a ribose-like framework to serve in a distributed database mediation system involving diverse remote services and data formats. The involved services were all driven by a controller transducing conversational scripts from a control channel. The controller drove a serial data link, transmitting queries and commands to remote services on the output channel and switching context-specific response transducers onto the input channel. Response transducers reduced query responses to SQL statements for the mediator and reduced command responses to guidance to be injected into the control channel to condition the course of the ongoing conversation.

Ginr subsequently disappeared from the public domain and has only recently been published with an open source license on GitHub. It has been upgraded with substantial improvements, including 32-bit state and symbol enumerators and compiler support for transcoding Unicode symbols to UTF-8 bytes. It provides a full complement of algebraic operators that can be applied to reliably produce very complex (and very large) automata. Large and complex patterns can be decomposed into smaller and simpler patterns, compiled to FSTs, and reconstituted on ribose runtime stacks, just as complex procedural algorithms in Java are decomposed into simpler methods that Java threads orchestrate on call stacks in the JVM runtime.

Ribose

Ribose is a proof of concept exercise intended to demonstrate the general idea of pattern oriented information processing. It specializes ginr to express transducers using terms of the form (A b, X[`Y` ...] ...), where A is a pattern involving input symbols, b is an input symbol that is not a prefix of A, X is the first effector invoked when b is read, and `Y ...` is a list of parameters bound to X.

An Overview and an Example

Ribose suggests a pattern-oriented approach to information that minimizes dependency on external libraries and could reduce complexity, vulnerability and development and runtime costs in information workflows. Ribose generalizes the transducer design pattern that is commonly applied to filter, map and reduce collections of data in functional programming paradigms. Common usage of this design pattern treats the presentation of inputs as a simple series T* without structure. Ribose extends and refines this design pattern, allowing transducers to precisely navigate (filter), map (select effect vector) and reduce (execute effect vector) complex information under the direction of syntactic cues in the input.

Here the filter component is expressed as a collection of nested regular patterns describing an input source, using symbolic algebraic expressions to articulate the syntactic structure of the input. These unary input patterns are then extended to binary transduction patterns that map syntactic features to effect vectors that incrementally reduce information extracted from the input. The syntactic structure provides a holistic navigable map of the input and exposes cut points where semantic actions should be applied. This presents a clear separation of syntactic and semantic concerns: Syntax is expressed in a purely symbolic domain where patterns are described and manipulated algebraically, while semantics are expressed poetically in a native programming language as effectors in a domain-specific target class. Syntactic patterns are articulated without concern for target semantics and effector implementation of semantic actions is greatly simplified in the absence of syntactic concerns.

The ribose runtime transduces byte* streams simply and only because byte is the least common denominator for data representation in most networks and computing machines. Ginr compiles complex Unicode glyphs in ribose patterns to multiple UTF-8 byte transitions, so all ribose transductions are effected in the byte* domain and only extracted textual features are decoded and widened to 16-bit Unicode code points. Bytes 0xF8-0xFF, which are never expressed in UTF-8 encodings, are available for in-band signalling and can be used to express syntactic structure (push/pop object stack, open/close array) and semantic guidance (data names, types), obviating concern about embedded "special characters". Binary data can be embedded using self-terminating binary patterns or prior length information, as long as they are distinguishable from other artifacts in the information stream. Raw binary encodings may be especially useful in domains, such as online gaming or real-time process control, that demand compact and efficient messaging protocols with relaxed readability requirements. Semantic effectors may also inject previously captured bytes or out-of-band signals, such as countdown termination, into the input stream to direct the course of transductions.

The ribose runtime operates multiple concurrent transductions, each encapsulated in a Transductor object that provides a set of core compositing and control effectors and coordinates a transduction process. Nested FSTs are pushed and popped on transductor stacks, with two-way communication between caller and callee effected by injecting information for immediate transduction. Incremental effects are applied synchronously as each input symbol is read, culminating in complete reduction and assimilation of the input into the target domain. For regular and most context-free input patterns transduction is effected in a single pass without lookahead. Context-sensitive or ambiguous input patterns can be classified and resolved with unbounded lookahead (select clear paste* in) or backtracking (mark reset) using core transductor effectors.

An Example

Here is a simple example, taken from the ribose transducer that reduces the serialized form of compiled ginr automata to construct ribose transducers. The input pattern is simple and unambiguous and is easily expressed:

header = 'INR' (digit+ tab):4 digit+ nl; # a fixed alphabetic constant, 4 tab-delimited unsigned integers and a final unsigned integer delimited by newline
transition = (digit+ tab):4 byte* nl;    # 4 tab-delimited unsigned integers followed by a sequence of bytes of length indicated by the 4th integer, ending with newline
automaton = header transition*;          # the complete automaton

The automaton input pattern above is extended to the AUtomaton transducer pattern below, which checks for a specific tag and marshals integer fields into an immutable Header record and an array of Transition records. Fields are extracted to raw byte[] arrays using the clear, select and paste effectors until a newline triggers a domain-specific header or transition effector to decode and marshal them into Header and Transition records. Finally the transitions are reduced in the automaton effector to a 259x79 transition matrix, which the ribose compiler will reduce to a 13x27 transition matrix by coalescing equivalent inputs symbols (e.g., digits in this scenario) using the equivalence relation on the input domain induced by the ginr transition matrix.

Number = (digit, paste)+;
Symbol = (byte, paste count)* eol;
eol = cr? nl;
inr = 'INR';

Automaton = nil? (
# header
  (inr, select[`~version`] clear) Number
  (tab, select[`~tapes`] clear) Number
  (tab, select[`~transitions`] clear) Number
  (tab, select[`~states`] clear) Number
  (tab, select[`~symbols`] clear) Number
  (eol, header (select[`~from`] clear))
# transitions
  (
    Number
    (tab, select[`~to`] clear) Number
    (tab, select[`~tape`] clear) Number
    (tab, select[`~length`] clear) Number
    (tab, select[`~symbol`] clear count[`~length` `!eol`]) Symbol
    (eol, transition (select[`~from`] clear))
  )*
# automaton
  (eos, automaton stop)
):dfamin;

Automaton$(0,1 2):prsseq `build/compiler/Automaton.pr`;

The final prsseq operator verifies that the Automaton$(0, 1 2) automaton is a single-valued partial rational function mapping recognizable sequences from the input semiring into the semiring of effectors and effector parameters. The branching and repeating patterns expressed in the input syntax drive the selection of non-branching effect vectors, obviating much of the fine-grained control logic that would otherwise be expressed in line with effect in a typical programming language, without support from an external parsing library. Most of the work is performed by transductor effectors that latch bytes into named fields that, when complete, are decoded and assimilated into the target domain by a tightly focused domain-specific effector.

Expressions such as this can be combined with other expressions using concatenation, union, repetition and composition operators to construct more complex patterns. More generally, ribose patterns are amenable to algebraic manipulation in the semiring, and ginr enables this to be exploited to considerable advantage. For example, Transducer = Header Transition* eos covers a complete serialized automaton, Transducer210 = ('INR210' byte* eos) @@ Transducer restricts Transducer to accept only version 210 automata (ginr's @ composition operator absorbs matching input and reduces pattern arity, the @@ join operator retains matching input and preserves arity).

In a nutshell, algorithms are congruent to patterns. The logic is in the syntax.

Basic Concepts

Ginr operates in a symbolic domain involving a finite set of symbols and algebraic semiring operators that recombine symbols to express syntactic patterns. Support for Unicode symbols and binary data is built in, and Unicode in ginr source patterns is rendered as UTF-8 byte sequences in compiled automata. UTF-8 text is transduced without decoding and extracted bytes are decoded only in target effectors. Ribose transducer patterns may introduce additional atomic symbols as tokens representing out-of-band (>255) control signals.

Input patterns are expressed in {byte,signal}* semirings, and may involve UTF-8 and binary bytes from an external source as well as control signals interjected by target effectors. Ribose transducer patterns are expressed in (input,effector,parameter)* semirings, mapping input patterns onto parametric effectors expressed by domain-specific target classes. They identify syntactic features of interest in the input and apply target effectors to extract and assimilate features into the target domain.

A ribose model is associated with a target class and is a container for related collections of transducers, target effectors, static effector parameters, control signals and field registers for accumulating extracted bytes. The ITransductor implementation that governs ribose transductions provides a base set of effectors to

extract and compose data in selected fields (select, paste, copy, cut, clear),
count down from preset value and signal end of countdown (count)
push/pop transducers on the transduction stack (start, stop),
mark/reset at a point in the input stream (mark, reset),
inject input for immediate transduction (in, signal),
or write extracted data to an output stream (out).

All ribose models implicitly inherit the transductor effectors, along with an extensible set of control signals {nul,nil,eol,eos} and an anonymous field that is preselected for every transduction and reselected when select is invoked with no parameter. New signals and fields referenced in transducer patterns implicitly extend the base signal and field collections. Additional effectors may be defined in specialized ITarget implementation classes.

The ribose transductor implements ITarget and its effectors are sufficient for most ribose models that transduce input to standard output via the out[...] effector. Domain-specific target classes may extend SimpleTarget to express additional effectors, typically as inner classes specializing BaseEffector<Target> or BaseParametricEffector<Target,ParameterType>. All effectors are provided with a reference to the containing target instance and an IOutput view for extracting fields as byte[], integer, floating point or Unicode char[] values, typically for inclusion in immutable value objects that are incorporated into the target model.

Targets need not be monolithic. In fact, every ribose transduction involves a composite target comprised of the transductor and at least one other target class (e.g., SimpleTarget). In a composite target one target class is selected as the representative target, which instantiates and gathers effectors from subordinate targets to merge with its own effectors into a single collection to merge with the transductor effectors. Composite targets allow separable concerns within complex semantic domains to be encapsulated in discrete interoperable and reusable targets. For example, a validation model containing a collection of transducers that syntactically recognize domain artifacts would be bound to a target expressing effectors to support semantic validation. The validation model and target, supplied by the service vendor, can then be combined with specialized models in receiving domains to obtain a composite model including validation and reception models and effectors. With some ginr magic receptor patterns can be joined with corresponding validator patterns to obtain receptors that validate in stepwise synchrony with reception and assimilation into the receiving domain.

Use Cases

Ribose as it stands is rough but ready for use in a wide range of use cases. Simple tasks that extract and composite recognized features for output to the standard output stream (e.g., as SQL or CSV) can be effected without any Java coding as noted above. More complex transduction use cases, such as rendering complete or partial object models from web service responses, can be realized by coding a specialized target class that presents custom effectors to assimilate extracted fields into the service data model. Very large or continuous inputs can be presented as serial segments of arbitrary size and are typically transduced with very low memory overhead.

Perhaps not today but Java service vendors with a pattern orientation could do a lot to encourage and streamline service uptake by providing transductive validation models containing patterns describing exported domain artifacts along with validation models and targets. In consumer domains, the vendor patterns would provide concise and highly readable syntactic and semantic maps of the artifacts in the vendor's domain. Here they can serve as starting points for preparing specialized receptor patterns that call out to effectors in consumer target models, and these patterns can be joined with the service validation patterns as described above. Service vendors could also include in their validation models transducers for rendering domain artifacts in other forms, e.g. structured text or some or other markup language, allowing very concisely serialized artifacts to be comprehensible without sacrificing brevity and efficient parsing.

Everything is Code

In computing ecosystems regular patterns and their equivalent automata, like microbiota in biological ecosystems, are ubiquitous and do almost all of the work. String them out on another construct like a stack or a spine and they can perform new tricks.

Consider ribonucleic acid (RNA), a strip of sugar (ribose) molecules strung together, each bound to one of four nitrogenous bases (A|T|G|C), encoding genetic information. Any ordered contiguous group of three bases constitutes a codon, and 61 of the 64 codons are mapped deterministically onto the 21 amino acids used in protein synthesis (the other three are control codons). This mapping is effected by a remarkable molecular machine, the ribosome, which ratchets messenger RNA (mRNA) through an aperture to align the codons for translation and build a protein molecule, one amino acid at a time (click on the image below to see a real-time animation of this process). Over aeons, nature has programmed myriad mRNA scripts and compiled them into DNA libraries to be distributed among the living. So this trick of using sequential information from one domain (e.g., mRNA->codons) to drive a process in another domain (amino acids->protein) is not new.

For a more recent example, consider a C function compiled to a sequence of machine instructions with an entry point (call) and maybe one or more exit points (return). This can be decomposed into a set of vectors of non-branching instructions, each terminating with a branch (or return) instruction. These vectors are ratcheted through the control unit of a CPU and each sequential instruction is decoded and executed to effect specific local changes in the state of the machine. Branching instructions evaluate machine state to select the next vector for execution. All of this introspective navel gazing and running around is effected by a von Neumann CPU, chasing an instruction pointer. As long as the stack pointer is fixed on the frame containing the function the instruction pointer will trace a regular pattern within the bounds of the compiled function. This regularity would be obvious in the source code for the function as implemented in a procedural programming language like C, where the interplay of concatenation (';'), union (if/else/switch) and repetition (while/do/for) is apparent. It may not be so obvious in source code written in other, e.g. functional, programming languages, but it all gets compiled down to machine code to run on von Neumann CPUs, on the ground or in the cloud.

Programming instruction-driven machines to navigate complex patterns in sequential data or asynchronous workflows is an arduous task in any modern programming language, requiring a mess of fussy, fine-grained twiddling that is error prone and difficult to compose and maintain. Refactoring the twiddling into a nest of regular input patterns leaves a simplified collection of code snippets that just need to be sequenced correctly as effectors, and extending input patterns to orchestrate effector sequencing via transduction seems like a natural thing to do. Transducer patterns expressed in symbolic terms can be manipulated using well-founded and wide-ranging algebraic techniques, often without impacting effector semantics. Effector semantics are very specific and generally expressed in a few lines of code, free from syntactic concerns, in a procedural programming language. Their algebraic properties also enable regular patterns to be reflected in other mathematical domains where they may be amenable to productive analysis.

Ribose presents a pattern-oriented, transductive approach to sequential information processing, factoring syntactic concerns into nested patterns that coordinate the application of tightly focused effector functions of reduced complexity. Patterns are expressed algebraically as regular expressions in the byte semiring and extended as rational functions into effector semirings, and effectors are implemented as tightly focused methods expressed by a target object in the receiver's domain. This approach is not new; IBM produced an FST-driven XML Accelerator to transduce complex data schemata (XML, then JSON) at wire speed. They did it the hard way, sweating over lists of transitions, apparently still unaware of semiring algebra. They deployed it alongside WebSphere but the range of acceptable input formats is solely selected by the vendor. We know that sequential data can be processed at wire speeds using transduction technology, and we have "Universal Turing Machines" capable of running any computable "algorithm". Where is the "Universal Transduction Machine" that can recognize any nesting of regular "patterns" constructed from the byte* semiring and transduce conformant data into the receiving domain?

From an extreme but directionally correct perspective it can be said that almost all software processes operating today are running on programmable calculators loaded with zigabytes of RAM. Modern computing machines are the multigenerational inheritors of von Neumann's architecture, which was originally developed to support numeric use cases like calculating ballistic trajectories. These machines are "Turing ~~tarpits~~ complete" (like origami), so all that is required to accommodate textual data is a numeric encoding of text characters. Programmers can do the rest. Since von Neumann's day we've seen lots of giddy-up but the main focus in machine development has been on miniaturization and optimizations to compensate for RAM access lag (John Backus' 'von Neumann bottleneck'). Sadly, when the first text character enumerations were implemented, their designers failed to note that their text ordinals constituted the basis for a text semiring wherein syntactic patterns in textual media could be extended to direct effects within a target semantic domain.

It is great mystery why support for semiring algebra is nonexistent in almost all programming languages and why hardware support for finite state transduction is absent from commercial computing machinery, even though a much greater proportion of computing bandwidth is now consumed to process sequential byte-encoded information. It may have something to do with money and the vaunted market forces that drive continuous invention and refinement. The folks that design and develop computing hardware and compilers are heavily invested in the von Neumann status quo, and may directly or indirectly extract rents for CPU and RAM resources. They profit enormously as, globally, the machines arduously generate an ever-increasing volume of data to feed back into themselves. So the monetary incentive to improve support for compute-intensive tasks like parsing reams of text may be weak. Meanwhile, transduction technology has been extensively developed and widely deployed. It is the basis for lexical analysis in compiler technology and natural language processing, among other things. But it is buried within proprietary or specialized software and is inaccessible to most developers.

Unfortunately I can only imagine what commercial hardware and software engineering tools would be like today if they had evolved with FST logic and pattern algebra built in from the get go. But it's a sure bet that the machines would burn a lot less oil and software development workflows would be more streamlined and productive. Information architects would work with domain experts to design serialized representations for domain artifacts and recombine these in nested regular patterns to realize more complex forms for internal persistence and transmission between processing nodes. These data representations would be designed and implemented simply, directly and efficiently without involving external data representation schemes like XML or JSON. There's a Big Use Case for that.

See Everything is Hard.

The Ribose Manifesto

Ribose encourages pattern-oriented design and development, which is based almost entirely on semiring algebra. Input patterns in text and other symbolic domains are expressed and manipulated algebraically and extended to map syntactic features onto vectors of machine instructions. A transducer stack extends the range of transduction to cover context-free input structures that escape semiring confinement. Effector access to RAM and an input stack support transductions involving context-sensitive inputs.

In this way the notion of a program, whereby a branching and looping series of instructions select data in RAM and mutate machine state, is replaced by a pattern that extends a description of an input source so that the data select instructions (effectors) that merge the input data into target state. Here target, effector and pattern are analogues of machine, instruction, program. A transducer is a compiled pattern and a transduction is a process that applies a specific input sequence to a stack of nested transducers to direct the application of effectors to a target model in RAM.

Ribose suggests that the von Neumann CPU model would benefit from inclusion of finite state transduction logic to coordinate the sequencing of instructions under the direction of nested transducers driven by streams of numerically encoded sequential media, and that programming languages should express robust support for semiring algebra to enable construction of multidimensional regular patterns and compilation to FSTs as first-order objects. Transduction of data from input channel (file, socket, etc.) interfaces into user space should be supported by operating system kernels. Runtime support for transduction requires little more than an transducer stack and a handful of byte[] buffers.

The Burroughs Corporation B5000, produced in 1961, was first to present a stack-oriented instruction set to support emergent compiler technology for stack-centric programming languages (e.g., Algol). The call stack then became the locus of control in runtime process execution of code as nested regular patterns of machine instructions. Who will be first, in the 21st century, to introduce robust compiler support for regular patterns and automata? Will hardware vendors follow suit and introduce pattern-oriented instruction sets to harness their blazing fast calculators to data-driven transductors of sequential information? Will it take 75 years to learn how to work effectively in pattern-oriented design and development environments?

Architects who want a perfect zen koan to break their minds on should contemplate the essential value of abstract data representation languages that can express everything without knowing anything. Developers might want to bone up on semiring algebra. It may look intimidating at first but it is just like arithmetic with nonnegative integers and addition and multiplication and corresponding identity elements 0 and 1, but with sets of strings and union and concatenation with identity elements Ø (empty set) and ϵ (empty string). The '*' semiring operator is defined as the union of all concatenation powers of its operand. Analogous rules apply identically in both domains, although concatenation does not commute in semirings. See Math is Hard? for a scolding.

Best of all, semiring algebra is stable and free from bugs and vulnerabilities. Ginr is mature and stable, although it needs much more testing in diverse symbolic domains, and its author should receive some well-deserved recognition and kudos. Ribose is a hobby horse, created only to lend some support to my claims about pattern-oriented design and development.

Good luck with all that.

Disclaimer

Ribose is presented for demonstration only and is not regularly maintained. You may use it to compile and run the included examples, or create your own transducers to play with. Or clone and refine it and make it available to the rest of the world. Transcode it to C and wrap it in a Python thing. Do what you will, it's open source.

Binary executable copies of ginr (for Linux) and ginr.exe (for Windows) are included in etc/ginr for personal use (with the author's permission); ginr guidance is reposted in the sidebar in the ribose wiki. You are encouraged to clone or download and build ginr directly from the ginr repo.

Ribose has been developed and tested with OpenJDK 11 and 17 in Ubuntu 18 and Windows 10. It should build on any unix-ish platform, including git bash, Msys2\mingw or Windows Subsystem for Linux (WSL) for Windows, with ant, java, bash, cat, wc, grep in the executable search path. The JAVA_HOME and ANT_HOME environment variables must be set properly, e.g. export JAVA_HOME=$(readlink ~/jdk-17.0.7).

Clone the ribose repo and run ant clean package to percolate the ribose and test libraries and API documentation into the jars/ and javadoc/ directories. This will also build the ribose compiler and test models from transducer patterns in the patterns/ directory. The default ci-test target performs a clean build and runs the CI tests.

-: # set home paths for java and ant
-: export JAVA_HOME="$(realpath ./jdk-17.0.7)"
-: export ANT_HOME="$(realpath ./ant-1.10.12)"
-: # clone ribose
-: git clone https://github.com/jrte/ribose.git
Cloning into 'ribose'...
...
Resolving deltas: 100% (2472/2472), done.
-: # build ribose, test and javadoc jars and compiler, test models
-: cd ribose
-: ant package
Buildfile: F:\Ubuntu\git\jrte\build.xml
...
BUILD SUCCESSFUL
-: # list build products
-: ls jars
ribose-0.0.2.jar  ribose-0.0.2-api.jar  ribose-0.0.2-test.jar
-: jar -tvf jars/ribose-0.0.2.jar|grep -oE '[a-z/]+TCompile.model'
com/characterforming/jrte/engine/TCompile.model
-: find . -name '*.model' -o -name '*.map'
./build/Test.map
./build/Test.model
./TCompile.map
./TCompile.model
-: # run the CI tests
-: ant

Instructions for building ribose models and running transductors from the shell and in the Java VM are included the ribose API documentation (javadoc). The com.characterforming.ribose package documentation specifies the arguments for the runnable Ribose class and presents the ribose runtime interfaces. The main interfaces are IRuntime, ITransductor and ITarget. The runnable Ribose class can be executed directly or from the shell scripts in the project root:

rinr: compile ginr patterns from a folder containing ginr source files (*.inr) to DFAs (*.dfa)
ribose compile | run | decompile:
- compile: compile a collection of DFAs into a ribose model for a specific target class
- run: run a transduction from a byte stream onto a target instance
- decompile: decompile a transducer

The shell scripts are tailored to work within the ribose repo environment but can serve as templates for performing equivalent operations in other environments. Other than ginr ribose has no dependencies and is contained entirely within the ribose jar file.

See the javadoc overview, package and interface documentation for information regarding use of the ribose compiler and transduction runtime API in the JVM.

For some background reading and a historical perspective visit the ribose wiki.

See LICENSE for ribose licensing details.

Postscript

Please somebody burn this into an FPGA and put it in a box like this and sell it. But don't bind the sweetness to a monster and hide it in the box; be sure to provide and maintain a robust compiler for generalized rational functions and relations (hint). Show information architects that they can encode basic domain artifacts as patterns in text semirings (e.g., UNICODE*) and combine patterns to represent more complex artifacts for persistence and transmission and decoding off the wire. Know that they understand their domains far better than you and can do this without heavy handed guidance from externalities like IBM, Microsoft, Amazon, Google or yourself.

You never know. Folks who process vast volumes of byte encoded textual data (e.g., UTF-8*) off the wire to feed their search engines, or from persistent stores to feed their giant AI brains, might find unimagined ways to repurpose your box. And you won't have to tweak your FPGA one bit, if you've done it right, because these novel adaptations will be effected outside the box, in the pattern domain. A thriving pattern-oriented community will share libraries of patterns to cover common artifacts and everyone will love you.

Then XML will go away. JSON too. Think about it.

Thank you.

ribose's People

Contributors

Stargazers

Watchers

Forkers

tfmorris hiatusss

ribose's Issues

Optimize looping in self-referencing states

States that contain non-null transitions into self with nil effect can be optimized in the runtime (Transductor.run()) by iterating directly through the input bytes and checking each byte against a bit map of bytes that transit back into same state. First byte (or any signal) that escapes the trap falls through to normal indexing through the input equivalence filter and state transition map.

Use a ThreadLocal<Codec> variable to contain charset codecs

Charset codecs are used throughout ribose code. Storing them in a ThreadLocal<Codec> on first use allows them to be accessed anywhere. IModel.close() implementations detach the ThreadLocal<Codec> instance from the calling thread. But
threads that do not directly or indirectly (try-with-resources) call IModel.close()call the static IModel.detach() method to rid themselves of the thread local Codec instance.

Implement transducer for saved ginr DFA files

This fix will conclude issue #15. It will require refactoring RuntimeModel into a persistent store, a runtime component and a compiler component. The compiler component will use a ribose component to transduce ginr DFA files. There is a bit of vanity here, since the current compiler implementation is simple enough but RuntimeModel is too complex and need refactoring in any case.

clearHeader = clear[`~version`] clear[`~tapes`] clear[`~transitions`] clear[`~states`] clear[`~symbols`];
clearTransition = clear[`~from`] clear[`~to`] clear[`~tapes`] clear[`~length`] clear[`~symbol`];

Automaton = (
	(nil, clearHeader clearTransition select)
	('INR' digit+ @@ PasteAny) (tab, select[`~version`] cut select)
	(digit+ @@ PasteAny) (tab, select[`~tapes`] cut select)
	(digit+ @@ PasteAny) (tab, select[`~transitions`] cut select)
	(digit+ @@ PasteAny) (tab, select[`~states`] cut select)
	(digit+ @@ PasteAny) (tab, select[`~symbols`] cut select)
	(nl, header clearHeader)
	(
		(digit+ @@ PasteAny) (tab, select[`~from`] cut select)
		(digit+ @@ PasteAny) (tab, select[`~to`] cut select)
		(digit+ @@ PasteAny) (tab, select[`~tape`] cut select)
		(digit+ @@ PasteAny) (tab, select[`~length`] cut select count[`~length` `!eol`])
		(byte* @@ PasteBinary) (eol, select[`~symbol`] cut select)
		(nl, transition clearTransition)
	)*
	(eos, automaton)
):dfmain;

Automaton$(0,1 2):prsseq `build/ribose/automata/Automaton.pr`;
Automaton:save `build/ribose/automata/Automaton.dfa`;

Something like that...

Performance updates

A couple of things can be improved:

roll the sum/product/scan traps and nin/paste trap into a single loop
- smoothing out transitions between these traps improves instruction pipelining
increase the thresholds for inserting msum/mproduct effectors in the model compiler
- short runs in sum/product traps increase the number of such transitions

Literal tokens beginning with ! @ ~ or 0xf8 must be escaped

This was attempted in a half-@ssed way in (d946184: Allow escaping token type prefix for literal token), which won't work out well for literals with >1 repetitions of the lead byte. Instead of doubling the lead byte, use the 0xF8 byte (which is illegal in utf-8 content) to prefix literal token that must be escaped.

Escapement is required only for literal tokens that appear in effector parameter lists and is largely transparent to user code. The IToken interface will be extended with new methods isField(), isTransducer(), is Signal(), isLiteral(). Regardless of token type, IToken.getSymbol() will always return the token bytes with escape byte or type reference byte removed (eg, !signal -> signal, 0xf8text -> text, text -> text).

Use a vector of aftereffects to convey cumulative effects per transition in run()

Currently effectors return a bitmap of aftereffects but it is getting crowded. Also some aftereffects need to convey more information (eg signal ordinal). So use an int[] vector to accumulate these per transition and handle them in order after all effectors have completed.

Ribose will be migrating to Java 17 soon...

Java 17 is all about sums and products and algebraic data types. Ribose and ginr are all about sums and products and algebraic semirings. Let's hook 'em up and spin them together.

Ribose transductors can extract from utf-8 text streams immutable tuples (algebraic types, or records) containing only final primitive fields decoded from raw utf-8 bytes extracted from input in response to syntactic cues expressed by an algebraic transducer pattern. In such use cases, semantic effectors assemble these records for assimilation into a target domain.

Transductors operate without concern for stream segmentation into IP packets or channel input buffers. Would be nice if support for transductor mediation between native socket and channel interfaces was baked into Java XX. It's like FORTRAN I/O format specifiers (which were/are actually pretty cool for numeric data) on Java steroids. Or something.

Found this: https://steve-yegge.blogspot.com/2006/03/execution-in-kingdom-of-nouns.html, which is a hoot. To wit, my first implementation of the general idea (in C, circa 1980) had transducers mapping input to semantic effect directly through function pointers. No IEffector IParameterizedEffector BaseEffector BaseParameterizedEffector motherhood at all.

Marking wastes the heap

Marking (backtracking) is complex and out of line with ribose forward-only approach. It is provided only to support certain use cases that require looking ahead with nil effect to determine context before selecting a transducer to apply semantic effect. None of the current test cases require this feature but two (LinuxKernelLoose and LinuxKernelStrict) use it in order to exercise marking and resetting in segmented input streams (LinuxKernel effects the identical transduction without backtracking).

When data is pushed onto a transductor input stack (transductor.input(data)) it is presumed to be the next sequential block of data in the input stream. When the transductor runs and completes this block with a mark set, the identical data buffer is included in the transductor mark set and may not be refreshed with new data from the streaming source. For dense sources this will occur frequently and consequent allocations of new data buffers result in more frequent GC activity. So ITransductor.recycle(byte[]) will be implemented to allow marked and reset data buffers to be recycled back into streaming transductions (eg Transductor.transduce()).

Best not to use backtracking at all if avoidable. The general use case is a set of transducers Ti (i>1) and an input feature w = uv where all Ti recognize u (with divergent effect) but are divergent on v, which determines which Ti should be selected to transduce all of w. A classifying transducer C can then be constructed as C = (nil, mark) {(Ti$0,reset start[Ti])} so that C(w) -> Tx(w) marks the in point for w, recognizes the input pattern for the specific Tx that can transduce w effectively, resets the input to the mark and pushes Tx to transduce marked input w. When Tx(w) returns its out point is synchronous with the in point of C and the input stream is unmarked.

Enable text transductions involving multibyte character encodings

Ginr (2.1.0c) is improving Unicode support, but jrte is lagging and as it stands now only 7-bit (ASCII) text can be transduced. Jrte is using char[] and CharBuffer everywhere, which means that multibyte character encodings are decoded to 16-bit code points in jrte IInput streams. However ginr patterns are compiled to raw byte encodings and there is no way to specify a 16-but code point in a pattern.

Ginr is moving in the right direction, since it obviates the need for need to fully decode byte[]->char[] to present to runtime transductions. To keep up jrte must be refactored (extensively) to use byte[] and ByteBuffer to represent input sequences. The prologue will have to be extended to include utf8 = utf7 + {80..FF} and utf7-dependent definitions must be extended to include all utf8 bytes (eg PasteAny = (utf8, paste)*). Patterns that use non-ASCII characters will have to adapt as well since, for example, ('⅀', paste) will paste only the last byte of the UTF-8 encoding, ('⅀' @ PasteAny) would be required instead (this shouldn't be a problem for runs in master or head branches because they never worked with non-ASCII inputs anyway).

This will require an extensive rewrite of many jrte components and will be undertaken on a new branch (raw). Issue #6 will be addressed in raw by deprecating ITransduction.input(IInput[]) and the whole IInput framework and driving transductions from client code with sequential calls to Transduction.run(byte[] ...).

Be patient. This is old code, I'm old, even my dog is old. And Java hurts.

Ginr+ribose for Windows

With a small change I was able to build ginr for Windows using the Msys2/c++ toolchain. The ginr.exe binary is included in the etc/ginr folder along with the linux binray ginr. The ribose full build (ant ci-test) successfully builds ribose and test models and jar files and runs the available tests without issue. The ribose ant targets must be run in an Msys2 terminal with Java. I built with (Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.13+10-LTS-370, mixed mode)) for Windows. OpenJDK for WIndows should work as well.

Msys2 is required in order to run ribose ant targets in build.xml and etc/sh/* scripts on Windows (needs bash, wc, grep). Cygwin might work but I haven't tried it. You do not need to install the C++ toolchain and yacc for Msys2 (required for ginr builds) as ginr executables for linux and windows are provided in etc/ginr.

A small fix is required for ginr and I have notified the author. I will push a commit and close this issue when the ginr repo has been updated with the necessary fix.

Refactor ModelCompiler.factor()

This method and its dependent methods are a mess. It needs to be refactored into a new class and simplified.

Sum, product and scan trap injection should be enabled for count effector

As for nil effector. Current traps are injected only if all involved transitions invoke nil only, effect nothing for each byte, and never pass on a signal as interrupting token. If all involved transitions invoke count only then respective trap should effect decrement of counter for each trapped byte and pass on returned signal if counter bottoms out. This may lead to conflicts of interest with mproduct, as when counter signal interrupts before the product is completely matched, in some patterns but those will have to be worked out in the pattern domain.

This will enable msum traps to cover countable ranges of binary data, which currently are transduced outside the trap loops. It will also enable count[max !max] preset limits to guard against runaway traps, which may be desirable in some cases.

ReaderInput and StreamInput are broken

These are the available BaseInput subclasses that wrap stdin for simple text transduction processes. Neither of these work unless entire input is preloaded into a single buffer.

Input from stdin is limited to <4k and larger files must be preloaded as for jrte.test.com.characterforming.jrte.test.FileRunner.

Need bootstrap compiler for Automaton.dfa

Automaton.inr defines the transducer used to load ginr DFAs in the model compiler. It is also involved in packaging Automaton.dfa to build TCompile.model (idempotently) in ribose builds but this fails if the enumeration of built-in effectors changes in Transductor effector.

Clean up Charset codec usage and properties

Ginr automata are rendered in UTF-8 and ribose uses UTF-8 Charset codecs to map between bytes and characters. At present ribose uses a global static Charset(UTF-8) instance and all encoding and decoding is performed using static Charset methods that instantiate transient codec objects. This needs to be fixed, so that each class requiring codecs instantiates persistent codecs and uses them when required.

The global static ribose runtime Charset thread-safe and is the source for all codecs instantiated in the ribose runtime. Transductions are assumed to be single-threaded so these codec instances cannot be shared outside the transduction. For each ITransduction instance a single (encoder, decoder) pair is instantiated and shared by all IEffector and INamedValue instances bound to the transduction. Transductions are assumed to be single-threaded so these codec instances cannot be shared outside the transduction.

The Base.getRuntimeCharset() method returns a reference to the global static Charset corresponding to the charset name specified by ribose.runtime.charset (see below). The ITarget interface has been extended with a default implementation of getEffectiveCharset() that simply calls Base.getRuntimeCharset() andreturns its result.

A new ribose setting ribose.runtime.charset (default UTF-8) will be added for completeness sake but cannot be overridden without a lot of heavy lifting in ginr. To use a different text encoding, say X, a bijective transducer TxU: X -> UTF-8 mapping between X and UTF-8 byte codes is required. So if TxU is a such ginr automaton and Gu is a ribose transducer expecting UTF-8 input then Gt = TxU @ Gu will accept X-encoded inputs and for any plain text t, ((t @ X) @ TxU @ Gu) == ((t @ UTF8) @ Gu). Good luck with that.

Which is to say the ribose.runtime.charset property should not be overridden. Additional properties, to specify transduction input and output buffers sizes have been added or changed. All ribose properties are presented as System properties that can be overridden by specifying new values as -D<prop>=<value> args in the Java command line.

Here is a complete list ribose properties and default values)

ribose.runtime.charset: ("UTF-8") The canonical name of the Charset to use for text codecs
ribose.inbuffer.size: ("65539") The size in bytes to use for buffering transduction input
ribose.outbuffer.size: ("8196") The size in bytes to use for buffering transduction output
jrte.out.enabled: (true) Disable out[] effector output (used for testing)
regex.out.enabled: (true) Disable regex match output (used for testing)

Slow run time for regex matching tests on large files

The FileRunner test harness is using readline() for match input. That's sad and I don't really want to talk about it but it must be fixed. The matcher should receive a CharBuffer containing the entire input file (decoded to Unicode) as does the equivalent ribose test run.

Arrange rows and columns in transition matrices according to transition frequency

By counting the frequency of input equivalents in a representative corpus of input media each transducer can permute nominal input ordinals and enumerate them in decreasing frequency order. Similarly for states. This would concentrate the most frequent transitions around the origin T[0,0] in the transition matrix T. In run-time transductions this would concentrate most activity within a reduced number of core lines and so reduce thrashing on the available cache lines. This can be tested by measuring transition frequency and throughput in instrumentation runs and throughput in benchmarking runs with the identical input corpus.

WIP: Implement Object stack to support marshaling context-free data

Currently all named values defined in a ribose model are globally addressable, behaving analogously to general purpose registers on a CPU, which is fine for models with limited stack depth but problematic for more complex data. A value stack is required to support deeply nested structures.

This is going to take a long time, on the poc branch. Dev and master may receive minor updates in the interim.

Prepare ribose for external use (outside jrte/ribose)

Ribose development and testing to date has been conducted without regard for external use and recent attempts to use it outside the repo folder have been awkward. So a suite of simple examples that use ribose from outside the repo environment are needed. To support this the TCompile.(model,map) resources need to be bundled into the ribose jar file. The compiler model resource will be extracted to a temp file for ribose compiler use and deleted when the hosting JVM exits.

For the time being the workaround is to place TCompile.model in the working directory of the JVM that is using the ribose compiler. There may be other usability issues discovered and this issue will be updated accordingly.

Mark/reset is broken

Gets lost frequently if ribose.inbuffer.size < typical marked span size. Setting -Dribose.inbuffer.size=X where X>>typical marked span size may be an workaround (default is 64kb, test breaks in LinuxKernelLoose with -Dribose.inbuffer.size=17).

The fix will include an appropriate test case. This has been broken since 55f8940 (Eliminate field,signal references from input stack).

Working on it...

Use a buffered FileOutputStream for out[] effector output

The built-in out[] effector currently writes to System.out (filtered, PrintStream) whereas it should be writing raw bytes (UTF-8 or binary) to a FileOutputStream. Using System.out is expedient in Java source and in shells that support pipes but filtering is not acceptable. Also, redirection of large volumes of transduction output through System.out is very, very slow in current ribose builds for Windows:

Target test: started Sun Nov 20 10:18:49 AST 2022
Target test: finished Sun Nov 20 10:23:03 AST 2022 (253678)
(>4m)

These same tests run normally in Ubuntu:

Target test: started Sun Nov 20 11:30:02 AST 2022
Target test: finished Sun Nov 20 11:31:00 AST 2022 (58016)
(<1m)

Ribose test inputs are read as binary byte streams via FileInputStream. Reworking the out[] effector to buffer and write output via FileOutputStream will also facilitate eventual migration to async i/o and piping data between Transductor instances, maybe?

Collapse non-branching input vectors with no effect

Long literal strings on the input tape are often transduced with no effect other than to mark a syntactic cut point in a pattern. For example:

'<cycle-start id="' digit+ '" type="scavenge" contextid="' ...

The term '<cycle-start id=' requires 16 states and forces 16 bytes into singleton input equivalence classes. This can be reduced to two states and a smaller number of equivalence classes with a match[] effector that matches each input byte with a reference string and emits a nul signal if an input byte does not match the next sequential byte in the reference string:

('<', match[`cycle-start id="`]) (byte, match)* digit+ '" type="scavenge" contextid="' ...

But this approach removes information from the input tape to the parameter tape, making the pattern unusable for recombination using ginr's composition and join operators, among other things. So this must be implemented internally. The ribose model compiler must detect non-branching chains of input transitions with nil effect and merge the connecting transitions into a match loop that remains in a single state invoking an internal match effector (without parameters).

Pattern syntax and transduction semantics are preserved by this transformation. This can be implemented in a tight loop in a single state and a reduced number of input equivalence classes, which should considerably reduce L1 data and instruction cache misses. Also, bytes involved in the match loop need not go through the transducer transition matrix but can be matched directly in a tight inline loop iterating over the current input array.

This is another epic. Will take some time.

Ensure all states erased in mproduct construction have identical transitional effects on NUL input

In order for erased states to be safely erased they must all have identical transitional effects (next state, action) on NUL as the final state in the product vector. Transitional effects on other signals can be ignored, as only NUL can arise when transduction is in a product trap. If an input byte mismatches the trap product vector before the final state a NUL signal is injected and presented to the state holding the terminal symbol in the product vector.

Provide transduction output view to effectors

In current HEAD (ec6487b) there is no way for IEffector instances to access values extracted by the transduction. That's stupid.

To clear this up, the ITransactiopn interface is refactored to extract the methods relating to named values into a new IOutput interface. The IEffector interface provides a setOutput(IOutput) method to receive its output instance during the effector binding process. The BaseEffector class hosts a protected IOutput instance field and provides a setView() implementation to support subclassed effectors.

Also, build.xml is broken in ec6487b. The names of the jrte jar files have bogus version labels, and javac debug attributes are missing. The only version label applied to commits at this stage is HEAD. Someday there may be a jrte.0.0.1.jar but it won't be today. Today and for a TBD number of days going forward there will only be jrte-HEAD.jar and it will contain debugging info.

Also, please note that the HEAD branch is very volatile. HEAD development at this point is focused on taking advantage of recent changes ginr (transduction of text in the utf-8 encoded domain + new format for saved DFAs that supports arbitrary binary data within `backquoted` tokens). A more stable alpha version of this work will eventually be merged into the RAW branch -- the RAW version will be available as an alpha cut at some point TBD.

out effector wastes the heap

It is decoding parameter values and writing Unicode to stdout and a number of transient ByteBuffer object are created in the process. That's just wrong. Values should be written directly as utf8 bytes without copying or decoding.

Test script runs LinuxKernel transducer on a 20MB byte array preloaded in the heap starting with a 22MB heap and java pinned to CPUs 3&4. There the first run is with the out effector disabled. It is enabled for the second run and the heap just keeps on growing:

$ cat local/LinuxKernel-gc 
#! /bin/bash
gclog="-Xlog:gc=trace:file=/tmp/LinuxKernel-gc.log"
start="-Dregex.out.enabled=false -cp jars/ribose-0.0.0.jar -Dfile.encoding=UTF-8 -classpath jars/ribose-0.0.0.jar com.characterforming.ribose"
input="test-patterns/inputs/kern-10.log"
model="build/patterns/BaseTarget.model"
java="/usr/bin/time numactl --physcpubind 3,4 java -Xms22m"

echo "lines words characters = $(wc $input)"
echo "LinuxKernel with output disabled"
$java $gclog -Djrte.out.enabled=false $start.RiboseRuntime \
  --nil LinuxKernel  "$input" "$model" 2>/tmp/LinuxKernel-gc.time
cat /tmp/LinuxKernel-gc.time
cat /tmp/LinuxKernel-gc.log

rm /tmp/LinuxKernel-gc.log
echo "LinuxKernel with output enabled"
$java $gclog -Djrte.out.enabled=true $start.RiboseRuntime \
  --nil LinuxKernel "$input" "$model" 1>/tmp/LinuxKernel.out 2>/tmp/LinuxKernel-gc.time
echo "lines words characters = $(wc /tmp/LinuxKernel.out)"
cat /tmp/LinuxKernel-gc.time
cat /tmp/LinuxKernel-gc.log

Output for each run shows time metrics and gc log trace:

lines words characters =   115080  2199300 20420730 test-patterns/inputs/kern-10.log
LinuxKernel with output disabled
0.34user 0.06system 0:00.30elapsed 133%CPU (0avgtext+0avgdata 80276maxresident)k
0inputs+72outputs (8major+15283minor)pagefaults 0swaps
[0.002s][trace][gc] MarkStackSize: 4096k  MarkStackSizeMax: 16384k
[0.024s][debug][gc] ConcGCThreads: 1 offset 4
[0.024s][debug][gc] ParallelGCThreads: 2
[0.025s][debug][gc] Initialize mark stack with 4096 chunks, maximum 16384
[0.026s][info ][gc] Using G1
[0.106s][info ][gc] GC(0) Pause Young (Concurrent Start) (G1 Humongous Allocation) 1M->1M(24M) 3.168ms
[0.106s][info ][gc] GC(1) Concurrent Cycle
[0.111s][info ][gc] GC(1) Pause Remark 21M->21M(44M) 0.273ms
[0.114s][info ][gc] GC(1) Pause Cleanup 21M->21M(44M) 0.040ms
[0.114s][info ][gc] GC(1) Concurrent Cycle 8.494ms
LinuxKernel with output enabled
lines words characters =   70279  210837 7683656 /tmp/LinuxKernel.out
1.15user 1.96system 0:02.84elapsed 109%CPU (0avgtext+0avgdata 151680maxresident)k
0inputs+15080outputs (8major+37002minor)pagefaults 0swaps
[0.002s][trace][gc] MarkStackSize: 4096k  MarkStackSizeMax: 16384k
[0.006s][debug][gc] ConcGCThreads: 1 offset 4
[0.006s][debug][gc] ParallelGCThreads: 2
[0.006s][debug][gc] Initialize mark stack with 4096 chunks, maximum 16384
[0.006s][info ][gc] Using G1
[0.076s][info ][gc] GC(0) Pause Young (Concurrent Start) (G1 Humongous Allocation) 1M->1M(24M) 3.144ms
[0.076s][info ][gc] GC(1) Concurrent Cycle
[0.085s][info ][gc] GC(1) Pause Remark 21M->21M(44M) 1.679ms
[0.085s][info ][gc] GC(1) Pause Cleanup 21M->21M(44M) 0.047ms
[0.085s][info ][gc] GC(1) Concurrent Cycle 8.977ms
[0.280s][info ][gc] GC(2) Pause Young (Normal) (G1 Evacuation Pause) 33M->21M(44M) 4.602ms
[0.366s][info ][gc] GC(3) Pause Young (Concurrent Start) (G1 Evacuation Pause) 33M->21M(44M) 3.377ms
[0.366s][info ][gc] GC(4) Concurrent Cycle
[0.368s][info ][gc] GC(4) Pause Remark 21M->21M(44M) 1.098ms
[0.369s][info ][gc] GC(4) Pause Cleanup 21M->21M(44M) 0.059ms
[0.369s][info ][gc] GC(4) Concurrent Cycle 3.194ms
[0.448s][info ][gc] GC(5) Pause Young (Normal) (G1 Evacuation Pause) 33M->21M(44M) 3.622ms
[0.526s][info ][gc] GC(6) Pause Young (Concurrent Start) (G1 Evacuation Pause) 33M->21M(69M) 3.219ms
[0.526s][info ][gc] GC(7) Concurrent Cycle
[0.528s][info ][gc] GC(7) Pause Remark 21M->21M(69M) 0.416ms
[0.529s][info ][gc] GC(7) Pause Cleanup 21M->21M(69M) 0.063ms
[0.529s][info ][gc] GC(7) Concurrent Cycle 2.456ms
[0.669s][info ][gc] GC(8) Pause Young (Normal) (G1 Evacuation Pause) 42M->21M(69M) 3.897ms
[0.836s][info ][gc] GC(9) Pause Young (Normal) (G1 Evacuation Pause) 46M->21M(69M) 3.236ms
[1.011s][info ][gc] GC(10) Pause Young (Normal) (G1 Evacuation Pause) 48M->21M(69M) 3.038ms
[1.193s][info ][gc] GC(11) Pause Young (Normal) (G1 Evacuation Pause) 49M->21M(90M) 3.268ms
[1.427s][info ][gc] GC(12) Pause Young (Normal) (G1 Evacuation Pause) 56M->21M(90M) 5.914ms
[1.686s][info ][gc] GC(13) Pause Young (Normal) (G1 Evacuation Pause) 61M->21M(90M) 3.315ms
[1.960s][info ][gc] GC(14) Pause Young (Normal) (G1 Evacuation Pause) 64M->21M(90M) 3.178ms
[2.254s][info ][gc] GC(15) Pause Young (Normal) (G1 Evacuation Pause) 66M->21M(110M) 10.231ms
[2.599s][info ][gc] GC(16) Pause Young (Normal) (G1 Evacuation Pause) 73M->21M(110M) 5.875ms

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.