Code Monkey home page Code Monkey logo

gaulois-pipe's Introduction

gaulois-pipe

A XSLT pipelining solution

gaulois-pipe is a tool that allows to define XSL pipelines, and to apply them on source files.

gaulois-pipe pipelines are made of XSLT, tees and outputs. The source file is transformed by the first XSL, the result is transformed by the second, and so on. The final step of each branch of a pipe is an output.

gaulois-pipe supports multi-threaded executions. You can define the thread-pool size, the max source file size that may be transformed on a multi-thread engine.

gaulois-pipe supports parameters. Parameters may be assign to whole pipe, sole XSL, or source file.

gaulois-pipe is based on a config file. Schema for such a config file is in src/main/resources/fr/efl/chaine/xslt/schemas/saxon-pipe_config.xsd There is also an old way to define pipelines via command-line, but this is for backward compatibility only, and should not be used.

Parameters can be defind on command-line, and used in config file, for example to define an output

Here is a very simple config file :

<config xmlns="http://efl.fr/chaine/saxon-pipe/config"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://efl.fr/chaine/saxon-pipe/config ../../../src/main/resources/fr/efl/chaine/xslt/schemas/saxon-pipe_config.xsd"
    documentCacheSize="2">
    <pipe mutiThreadMaxSourceSize="24349456" nbThreads="4">
        <xslt href="$[xslDir]/parallel.xsl">
            <param name="p-xsl" value="xsl-value"/>
        </xslt>
        <output id="main">
            <folder absolute="${user.dir}/$[destDir]"/>
            <fileName name="${basename}-$[p-file]-result.xml"/>
        </output>
    </pipe>
    <params>
        <param name="p-general" value="GENERAL"/>
        <param name="xslDir" value="./src/test/resources"/>
        <param name="destDir" value="./target/generated-test-files"/>
    </params>
    <sources>
        <file href="./src/test/resources/source.xml">
            <param name="p-file" value="substitution"/>
        </file>	       
    </sources>
</config>

gaulois-pipe's People

Contributors

arkamy avatar cmarchand avatar dependabot[bot] avatar jimetevenard avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

gaulois-pipe's Issues

The convert() method for an atomic Datatype should not (always) split the input value

Hi,

The convert() method for an atomic Datatype (see DatatypeFactory.java line 124) "conveniently" splits the input value as if it were always supposed to be a XPath sequence. That is: beginning / ending parenthesis "(" ")" are removed and commas "," are used as a sequence value delimiter.

But there are cases where this behaviour is wrong: for instance, if a parameter is defined as "xs:string" in a gaulois-pipe configuration file, it should mean that the input value must be taken "as is", and commas or parenthesis should not be treated as special characters.
I actually have several string parameters where I use commas, and I know I do not want them to be splitted!

If there a way to deactivate this functionality when it is not wanted?
Maybe if the parameter type is not a multiple one?

Thanks!

Error when calling external xslt file

Hi,

i get an error whe i call an external xslt, with absolute path or uri.
href is considered as path here

private File getFile() throws URISyntaxException {
 if(file==null) {
   if(href.startsWith("jar:")) {
     file = new File(new URI(href));
   } else if(href.startsWith("cp:")) {
     // nope
   } else {
     file = new File(href);
   }
 }
 return file;
}

and as uri here

private XsltTransformer getXsltTransformer(String href, HashMap<QName,ParameterValue> parameters)
    throws MalformedURLException, SaxonApiException, URISyntaxException, FileNotFoundException, IOException {
  String __href = (String)ParametersMerger.processParametersReplacement(href, parameters);
  LOGGER.debug("loading "+__href);
  XsltExecutable xsl = xslCache.get(__href);
  if(xsl==null) {
    LOGGER.trace(__href+" not in cache");
    try {
      Source xslSource = getUriResolver().resolve(href, getCurrentDirUri());
      if(xslSource==null) {
        throw new FileNotFoundException("Unable to resolve "+href);
      }

The best way, il think, is to consider href as an uri everywhere.

thanks :)

<choose> as first element of pipe

When a "choose" is the first element of a pipe it is not executed.
I have to put an "xslt" with an identity xsl before it.
When a choose is the last element of a pipe, there is an error.
release 1.01.11

@href on folder : bad uri construction

hi,

when i pass a parameter (as uri) to the config file, for treatement file source, i catch an exception on the folder uri. I pass a parameter like : "file:/toto" and the exception message said uri = 'c:\titi\project\file\toto : invalid uri.
gp.zip

Running GP in a JavaEE container fails

Try to create a GP config, mono-threaded, processing only 1 input file. Launch it from command line, it's successful.
Try to run exactly the same config from a tomcat servlet, it may fail, because GP will use tomcat's thread pool.
We should have a 'synchronous' processing mode available for gaulois-pipe.

Create many pipes and reuse them

It can be usefull to create a partial pipe, and to reuse it elsewhere :

<pipe id="reuse-me">
  <xslt href="1.xsl">
     <param name="p-xsl" value="${p1}"/>
  </xslt>
  <xslt href="2.xsl"/>
  <parameters>
     <param name="p1"/>
     <param name="p2"/>
  </parameters>
</pipe>
<pipe main="true">
  <pipe ref-id="reuse-me"/>
</pipe>

But this breaks the pipe definition :

  • a pipe may have no output, as it will be inserted as a step into another pipe
  • a pipe should have parameters, and when we re-use the pipe, we re-use it with parameters

Multiple <xslt> in <when>

Hello,

I tried to run 2 xsl inside :

But it seems that only the first one is executed.

I made my pipe worked, by using 2 with only 1 inside

Can you check this point, to see you're having the same issue ?
I'm using version 1.01.11.

Thanks

Config checks enhancements

No output at the end of a pipe.

The error message is sufficiently explicit.
But this could be constrained in the schema.

Moreover, Exceptions that are thrown at runtime are caught and added into the pipe's errors stack (which is a List<String> ).

This introduces a difference of behavior between problems that are pretty much identical (Mistake in config)

choose /tee as first child of main pipe

This problem is already known, cf. [Issue 23]
(#23)

But this is allowed the schema.
It fails only at runtime, with an exception message which does not make sense without rifling through the code.

java.lang.IllegalArgumentException: assignee must not be null
	at fr.efl.chaine.xslt.GauloisPipe.assignStepToDestination(GauloisPipe.java:772)
	at fr.efl.chaine.xslt.GauloisPipe.buildTransformer(GauloisPipe.java:764)
	at fr.efl.chaine.xslt.GauloisPipe.buildTransformer(GauloisPipe.java:578)
	at fr.efl.chaine.xslt.GauloisPipe.executesPipeOnMultiThread(GauloisPipe.java:364)
	at fr.efl.chaine.xslt.GauloisPipe.launch(GauloisPipe.java:293)

input-name parameter

I try to use the input name parameter, to not use the base-uri() function in the xslt pipes.
<xsl:param name="input-name" select="tokenize(base-uri(.),'/')[position()=last()]">/xsl:param
<xsl:param name="input-basename" select="string-join(tokenize($input-name,'.')[position() < last()],'.')">/xsl:param
I don't put anything in the gaulois-pipe configuration.
But there is no value in the xslt second file of the pipe.
Is it the right way to do that?

Gaulois raise Exception InvalidSyntax when folder does not exists

Hi,

when i run a gaulois pipe on an non existing folder source, il get the following Exception :

Caused by: fr.efl.chaine.xslt.InvalidSyntaxException: D:\data\output\prepare is not a valid directory
at fr.efl.chaine.xslt.config.ConfigUtil.buildFolderContent(ConfigUtil.java:503)
at fr.efl.chaine.xslt.config.ConfigUtil.buildSources(ConfigUtil.java:282)
at fr.efl.chaine.xslt.config.ConfigUtil.buildConfig(ConfigUtil.java:186)
at com.processing.step.GauloisPipeStep.run(GauloisPipeStep.java:93)
... 8 more
The syntax is correct. I think the best way is simply to consider 0 file as source, with no exception thrown .

For example, i run 2 gaulois, the first one for filtring files and the second one to do some thing else with filtred files. If all files are filtred on the first gaulois pipe, my seconf gaulois pipe raise an Exception.

thanks

open a jar:file:

I have a problem to open an xslt file in a jar file.
I use the jar;file: URI syntax.
Perhaps is it becvause my xslt file opens other xslt files as import?

Marc

Vulnerabilities introduced by Jetty

But Jetty 9.4.17 is a Java 11 library, and gaulois-pipe should continue to work with Java 8.

Replace Jetty by a light-weight http server, as there is only 2 services, with no authentification.

bug when inputFile has no extension

Hi,

i find a bug when the input has no extension (the ix gives -1) :

private String getFileName(File sourceFile, HashMap<QName,ParameterValue> parameters) {
String filename = (prefix!=null?prefix:"") + name + (suffix!=null?suffix:"");
String sourceName = sourceFile.getName();
int ix = sourceName.lastIndexOf(".");
// FIXME: this shouldn't be supported, it has been replaced by input-* pseudo-variables
String extension = sourceName.substring(ix);
String basename = sourceName.substring(0, ix);
String ret = filename.replaceAll("\$\{name\}", sourceName).replaceAll("\$\{basename\}", basename).replaceAll("\$\{extension\}", extension);
for(ParameterValue pv:parameters.values()) {
if(pv.getValue() instanceof String || pv.getValue() instanceof XdmAtomicValue) {
ret = ret.replaceAll("\$\["+pv.getKey()+"\]", pv.getValue().toString());
}
}
return ret;
}

Accent in the name of souce file

Hello,

I found that the gaulois-pipe doesn't accept the accent characters in the name of a source file.
For example: ED Procédures collectives.xml
I should rename it as ED Procedures collectives.xml

Qian

Typing parameters

All parameter values defined from command-line or from configuration file are always treated as xs:string.

If a xsl defines a parameter :

  <xsl:param name="foe" as="xs:boolean"/>

it is impossible to give a correct value to this parameter.

We need a way to type a parameter, at least from configuration file :

  <param name="foe" as="xs:boolean" select="true()"/>

I use @select here to say that it must be evaluated, where the @value is only a value thit is not evaluated. This to keep backward compatibility.

To define a param from command line, it'll be much more difficult ; a possibility will be to define an abstract typed parameter in configuration file, without any @value or @select, and to evaluate the command-line parameter according to this abstract definition :

  <param name="foe" as="xs:boolean" abstract="yes"/>

@matthieuRicaudDussarget : I'd like your opinion on this, please...

<choose> inside <tee>

Hello,

Is it possible to use a inside ? The schema doesn't seem to allow this case :

  <pipe mutiThreadMaxSourceSize="24349456" nbThreads="4" traceOutput="#logger">
    <xslt href="cp:/common/xsl/identity.xsl"/>
    <tee>
      <pipe>
        <xslt href="cp:/common/xsl/identity.xsl"/>
        <choose></choose> // Not allowed
      </pipe>
      <pipe></pipe>
    </tee>
    <xslt href="cp:/common/xsl/identity.xsl"/>
    <output id="main" method="xml" encoding="UTF-8" indent="no">
      <folder absolute="$[CONTEXT.URI]/$[resStep]"/>
      <fileName name="$[input-basename].xml"/>
    </output>
  </pipe>

Can you check this point, to see you're having the same issue ?
I'm using version 1.01.11.

Thanks

Add a choose step

The idea is to have a step, which can choose between various XSLT, on a per-input file basis, according to a XPath test.
Something like :

<pipe>
   <xslt href="first.xsl"/>
   <choose>
      <when test="/*/local-name eq 'root1'><xslt href="1.xsl"/></when>
      <when test="/*/local-name eq 'root2'><xslt href="2.xsl"/></when>
      <otherwise><xslt href="3.xsl"/></otherwise>
   </choose>
   <xslt href="last.xsl"/>
</pipe>

For each input file, when constructing the pipeline, and based on the name of root element, 1.xsl, 2.xsl or 3.xsl is inserted in the pipeline, between first.xsl and last.xsl

Parameters name can not be QName

There must have a way to define a parameter which is a qualified name, as XSLT parameters are QNames.
More, parameter names are resolved against other parameters definitions.
I can not imagine a XSLT or a Java Step where the parameter name is not known at compilation time ; so, parameter names shouldn't go through ConfigUtil.resolveEscapes().

Change the

resolveEscapes(parameterName, ...)

by

resolveQName(parameterName)

EntityResolver is not set

gaulois-pipe uses org.xmlresolver for URI Resolver. But org.xmlresolver is also an EntityResolver, and gaulois-pipe's Saxon is not configured to use this resolver.
So, when parsing a document taht has a doctype with a public or a system, xmlresolver is not used to resolve entities.

Passing a ThreadFactory to Gaulois-pipe

Hello christophe,
Would you mind to add the possiblity to pass a ThreadFactory to gaulois-pipe rather than using it's own ThreadFactory for threads instantiation?
Use case : Customer is using its own specific ThreadFactory compatible with a specific protocol

Thank you

Looking for source in non-existing directory

This throws an InvalidSyntaxException, it shouldn't. Sometimes, we look in error directory to check if there are errors, and if exception is thrown, error processing is impossible.

Files bigger than @mutiThreadMaxSourceSize are not processed

Given this pipe :

<pipe mutiThreadMaxSourceSize="24349456" nbThreads="4">
   <xslt href="identity.xsl"/>
</pipe>

With a file which size is 29246436, so bigger than @mutiThreadMaxSourceSize, the file is not processed.
If file is alone, it is processed. If @nbThreads="1", it is processed.

Add control to config to check that schema URI are valid and point to an available schema

Hi,

While running last version of Gaulois Pipe within my project, I'm getting this NullPointerException :

java.lang.NullPointerException: null
	at com.saxonica.ee.schema.sdoc.SchemaReader.read(SchemaReader.java:140) ~[saxonee-9.8.0.14.jar!/:na]
	at com.saxonica.config.EnterpriseConfiguration.addSchemaSource(EnterpriseConfiguration.java:630) ~[saxonee-9.8.0.14.jar!/:na]
	at net.sf.saxon.Configuration.addSchemaSource(Configuration.java:3025) ~[saxonee-9.8.0.14.jar!/:na]
	at fr.efl.chaine.xslt.GauloisPipe.launch(GauloisPipe.java:251) ~[gaulois-pipe-1.03.05.jar!/:na]

Just before this error, I get this log witch shows a invalid url path :

Loading schema cp:/gc\schemas\query.xsd

Thanks in advance

URI resolver not used to resolve //xslt/@href

URI resolver is not used to resolve XSL URIs.
So, if you define a catalog, catalog is not used to resolve XSL, only to resolve imported/included XSLT and data.

Problem is in GauloisPipe.getXsltTransformer(String,HashMap<QName, PrameterValue>), row #594

static-base-uri is lost when using compiled XSL

When using compiled XSL (.sef), if they have not been compiled with -relocate option, static-base-uri() -
may - return a wrong value.
Could it be possible to have a parameter always push to XSL, and that denotes the real static base uri ?

[HttpListener] This feature has nothing to do with GauloisPipe

Starting GauloisPipe as a HttpListener was a request for a project at Francis Lefebvre / Lefebvre-Sarrut.

This feature is an integration feature, and has nothing to do with pipeline processing. If a client need to start a GauloisPipe as a service, then GauloisPipe has to be startable as a service, needs to provide an API to add new files in queue, needs to provide an API to stop, but it must not provide an Http Server.

This feature requires a dependency on jetty:jetty-server, and this library has quite often vulnerabilities. Removing the http service will secure gaulois-pipe, and let each integrator integrate gaulois-pipe as he likes, removing the constraint to use jetty.

As this change introduce a breaking-change, this should be done in a major release.

Map parameters are not accepted. They should be.

As more and more XSL developments use XST 3.0, using map parameters become usual, and Gaulois Pipe should accept them.

At this point, it's impossible :

Caused by: net.sf.saxon.type.ValidationException: Only document() and element() are supported for node types
   at top.marchand.xml.gaulois.config.typing.DatatypeFactory.constructNodeDatatype(DatatypeFactory.java:176) ~[classes/:?]

Base-uri() in a Gaulois-pipe context

Hello,

When an XSL stylesheet is processed as the 2th or Nth step of a pipe, the return of base-uri() or base-uri(/) is always the empty sequence.
The reference to the initial file URI seems to be lost.

(cf. sample pipe attached)
base-uri-gaulois-pipe.zip

release 1.03.00

Error code not returned if an exception occurs

With the following shell, 0 is always returned as error code even if an exception occurs :
java -cp ... fr.efl.chaine.xslt.GauloisPipe ...
error=$?
echo "Error code gaulois-pipe : $error"

Exception in executesPipeOnMultiThread is not processed

In executesPipeOnMultiThread(), NullPointerException during running service.execute(r) is not processed. This function should return false in this case.

Exception in thread "pool-1-thread-1" java.lang.NullPointerException
at java.net.URI$Parser.parse(URI.java:3042)
at java.net.URI.(URI.java:588)
at org.xmlresolver.ResourceConnection.(ResourceConnection.java:42)
at org.xmlresolver.ResourceResolver.cacheStreamSystem(ResourceResolver.java:172)
at org.xmlresolver.ResourceResolver.resolveEntity(ResourceResolver.java:296)
at org.xmlresolver.Resolver.resolveEntity(Resolver.java:186)
at org.apache.xerces.util.EntityResolver2Wrapper.resolveEntity(Unknown Source)
at org.apache.xerces.impl.XMLEntityManager.resolveEntity(Unknown Source)
at org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:451)
at net.sf.saxon.event.Sender.send(Sender.java:179)
at net.sf.saxon.Configuration.buildDocumentTree(Configuration.java:3808)
at net.sf.saxon.s9api.DocumentBuilder.build(DocumentBuilder.java:369)
at net.sf.saxon.s9api.DocumentBuilder.build(DocumentBuilder.java:385)
at fr.efl.chaine.xslt.GauloisPipe.execute(GauloisPipe.java:461)
at fr.efl.chaine.xslt.GauloisPipe$1.run(GauloisPipe.java:340)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

output attributes (@indent, @cdata-section-elements)

I'm trying to get an indented output including CDATA elements.
My pipeline looks like this:


<output id="main"
method="xml" encoding="UTF-8" indent="yes" cdata-section-elements="elem1 elem2">

I also tried to put these attributes in my last stylesheet.
But neither in the configuration file, neither in the last stylesheet seem to work, the output is not indented and do not contain the CDATA elements.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.