Code Monkey home page Code Monkey logo

alfresco-webscript-manifold-connector's People

Contributors

maoo avatar openpj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

alfresco-webscript-manifold-connector's Issues

Manging ACL in MCF connector?

Hi Maurizio,
I was looking for integrating alfresco with Apache MCF 1.5.1 using CMIS connector. I was able to run it & getting all content metadata into solr index apart from ACL permissions.

Then i realized ACL indexing & storage is not supported in CMIS MCF connector so i moved to MCF Alfresco connector. But here also i could not see how to manage ACL indexing & storage.

Can you confirm if Alfresco connector supports ACL storage & indexing? If no, any way to achieve the same?

Regards.

Text extracting

Hi,

ManifoldCF use extract update handler to handle binary content. Binary content is sent to solr, and tikka try to extract text content and some metadata (mime type).

For alfresco connector, Alfresco should be used to convert binary to text as official solr do (by calling NodeContentGet). Because alfresco already know how to convert document to text.

But NodeContentGet webscript is protected by Certificat, you have to clone this webscript.

Getting "Not authorized , ReasonPhrase:Unauthorized" Error while building.

Hi All,

I have cloned the source locally & I am trying to build this connector but i am getting below error saying about alfresco private repo & un-authorization. Can you help?

Downloading: file://C:\Users\lalit.jangra\Desktop\alfresco-webscript-manifold-connector-master/maven-repo/org/jvnet/staxex/stax-ex/maven-metadata.xml
Downloading: http://repo.fusesource.com/nexus/content/groups/public/org/jvnet/staxex/stax-ex/maven-metadata.xml
Downloading: https://artifacts.alfresco.com/nexus/content/groups/public-snapshots/org/jvnet/staxex/stax-ex/maven-metadata.xml
Downloading: https://artifacts.alfresco.com/nexus/content/groups/private/org/jvnet/staxex/stax-ex/maven-metadata.xml
Downloading: https://artifacts.alfresco.com/nexus/content/groups/public/org/jvnet/staxex/stax-ex/maven-metadata.xml
Downloading: https://artifacts.alfresco.com/nexus/content/groups/public-snapshots/org/jvnet/staxex/stax-ex/maven-metadata.xml
Downloading: http://repo.maven.apache.org/maven2/org/jvnet/staxex/stax-ex/maven-metadata.xml
Downloading: https://artifacts.alfresco.com/nexus/content/groups/public/org/jvnet/staxex/stax-ex/maven-metadata.xml
Downloading: http://repository.jboss.org/nexus/content/groups/public/org/jvnet/staxex/stax-ex/maven-metadata.xml
Downloading: http://download.java.net/maven/2/org/jvnet/staxex/stax-ex/maven-metadata.xml
Downloaded: http://repo.maven.apache.org/maven2/org/jvnet/staxex/stax-ex/maven-metadata.xml (707 B at 0.3 KB/sec)
Downloaded: http://download.java.net/maven/2/org/jvnet/staxex/stax-ex/maven-metadata.xml (411 B at 0.2 KB/sec)
Downloaded: http://repository.jboss.org/nexus/content/groups/public/org/jvnet/staxex/stax-ex/maven-metadata.xml (583 B at 0.2 KB/sec)
[WARNING] Could not transfer metadata org.jvnet.staxex:stax-ex/maven-metadata.xml from/to alfresco-private-repository (https://artifacts.alfresco.com/nexus/content/groups/private):
Not authorized , ReasonPhrase:Unauthorized.
Downloading: http://repo.fusesource.com/nexus/content/groups/public/woodstox/wstx-asl/3.2.7/wstx-asl-3.2.7.pom
Downloading: https://artifacts.alfresco.com/nexus/content/groups/public/woodstox/wstx-asl/3.2.7/wstx-asl-3.2.7.pom
Downloading: https://artifacts.alfresco.com/nexus/content/groups/public-snapshots/woodstox/wstx-asl/3.2.7/wstx-asl-3.2.7.pom
Downloading: file://C:\Users\lalit.jangra\Desktop\alfresco-webscript-manifold-connector-master/maven-repo/woodstox/wstx-asl/3.2.7/wstx-asl-3.2.7.pom
Downloading: https://artifacts.alfresco.com/nexus/content/groups/private/woodstox/wstx-asl/3.2.7/wstx-asl-3.2.7.pom
Downloading: http://repo.maven.apache.org/maven2/woodstox/wstx-asl/3.2.7/wstx-asl-3.2.7.pom
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Alfresco Webscript Connector Parent POM ........... FAILURE [45.609s]
[INFO] Alfresco Webscript Connector ...................... SKIPPED
[INFO] Alfresco Instance ................................. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 49.890s
[INFO] Finished at: Wed Jan 15 10:52:44 IST 2014
[INFO] Final Memory: 9M/182M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project mcf-alfresco-webscript-connector-parent: Could not resolve dependencies for project org.apache.manifoldcf:mcf-alfresco-webscript-connector
-parent:pom:1.1.1: Failed to collect dependencies for [org.alfresco:alfresco-solr-integration:jar:4.2.c (compile), org.alfresco:alfresco-core:jar:4.2.c (compile), org.apache.manifo
ldcf:mcf-core:jar:1.1.1 (provided), org.apache.manifoldcf:mcf-agents:jar:1.1.1 (provided), org.apache.manifoldcf:mcf-pull-agent:jar:1.1.1 (provided), org.apache.manifoldcf:mcf-ui-c
ore:jar:1.1.1 (provided)]: Failed to read artifact descriptor for woodstox:wstx-asl:jar:3.2.7: Could not transfer artifact woodstox:wstx-asl:pom:3.2.7 from/to alfresco-private-repo
sitory (https://artifacts.alfresco.com/nexus/content/groups/private): Not authorized , ReasonPhrase:Unauthorized. -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException

AuthorityConnector

In 1.7 branch, I've seen you started to manage ACL by saving ACL in manifold repository document (in AlfrescoConnector : rd.setSecurityACL())

But as you don't provide an authorityConnector, Solr integration will not work. I'd like to implement this AuthorityConnector. But maybe you already start to implement it, or maybe the branch 1.7 is experimental ?

Error during retrieval of node properties

There is an issue that I think can be related to the Alfresco models, for some reasons one of these models doesn't have an important import of a dependent model (cm:contentModel).

Trying to get node properties the Alfresco SolrApiClient returns the following exception:

ERROR 2013-03-14 20:37:42,374 (Worker thread '37') - Error on getNodesMetaData fromId 14 toId 14
org.alfresco.service.namespace.NamespaceException: Namespace prefix cm is not mapped to a namespace URI
at org.alfresco.service.namespace.QName.createQName(QName.java:99)
at org.alfresco.service.namespace.QName.createQName(QName.java:121)
at org.alfresco.solr.client.SOLRAPIClient$SOLRTypeConverter$10.convert(SOLRAPIClient.java:1193)
at org.alfresco.solr.client.SOLRAPIClient$SOLRTypeConverter$10.convert(SOLRAPIClient.java:1190)
at org.alfresco.service.cmr.repository.datatype.TypeConverter.convert(TypeConverter.java:112)
at org.alfresco.solr.client.SOLRAPIClient$SOLRTypeConverter.convert(SOLRAPIClient.java:1208)
at org.alfresco.solr.client.SOLRAPIClient$SOLRDeserializer.deserializeValue(SOLRAPIClient.java:1226)
at org.alfresco.solr.client.SOLRAPIClient.getNodesMetaData(SOLRAPIClient.java:801)
at org.apache.manifoldcf.crawler.connectors.alfrescowebscripts.AlfrescoIndexTracker.processMetaData(AlfrescoIndexTracker.java:230)
at org.apache.manifoldcf.crawler.connectors.alfrescowebscripts.AlfrescoWebScriptsRepositoryConnector.processDocuments(AlfrescoWebScriptsRepositoryConnector.java:192)
at org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:551)

IllegalAccessError

When running ManifoldCF in Tomcat, with the Alfresco Webscripts connector as a repository connector, the following exception repeats within the log upon job execution:

java.lang.IllegalAccessError: tried to access field org.apache.manifoldcf.crawler.system.SeedingActivity.jobID from class org.apache.manifoldcf.crawler.system.JobIdStealer
at org.apache.manifoldcf.crawler.system.JobIdStealer.stealId(JobIdStealer.java:18)
at org.apache.manifoldcf.crawler.system.JobIdStealer.stealId(JobIdStealer.java:14)
at org.alfresco.consulting.manifold.AlfrescoConnector.addSeedDocuments(AlfrescoConnector.java:120)
at org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.addSeedDocuments(BaseRepositoryConnector.java:156)
at org.apache.manifoldcf.crawler.system.StartupThread.run(StartupThread.java:153)

Build broken with Alfresco Enterprise 4.1.3

Trying to build the connector with Alfresco Enterprise 4.1.3 returns the following exception:

[ERROR] Failed to execute goal on project mcf-alfresco-webscript-connector: Could not resolve dependencies for project org.apache.manifoldcf:mcf-alfresco-webscript-connector:jar:1.1.1: Failed to collect dependencies for [org.alfresco.enterprise:alfresco-solr-integration:jar:4.1.3 (compile), org.alfresco.enterprise:alfresco-core:jar:4.1.3 (compile), org.apache.manifoldcf:mcf-core:jar:1.1.1 (provided), org.apache.manifoldcf:mcf-agents:jar:1.1.1 (provided), org.apache.manifoldcf:mcf-pull-agent:jar:1.1.1 (provided), org.apache.manifoldcf:mcf-ui-core:jar:1.1.1 (provided)]: Failed to read artifact descriptor for woodstox:wstx-asl:jar:3.2.7: Could not transfer artifact woodstox:wstx-asl:pom:3.2.7 from/to alfresco-internal (https://artifacts.alfresco.com/nexus/content/groups/internal): Not authorized , ReasonPhrase:Unauthorized. -> [Help 1]

Getting rid of JobIdStealer.java

As documented in the class itself....

 * This is a hack to get the jobId given a {@link org.apache.manifoldcf.crawler.system.SeedingActivity}.
 * TODO: If a way to get the job id from within a connector in ManifoldCF is found, delete this.

This class is used to access the Job ID from the Alfresco Manifold Connector.
We use the Job ID as identifier (primary key) of entries that we log (using CrawlLogger.java) into the connector to keep the state of the crawling (Last Transaction ID, Last ACL Changeset ID)

In order to solve this issue it's possible to:

  • Find an alternative way to access the Job ID from a Manifold Connector
  • Use something different from the Job ID to use as primary key that is guaranteed to be unique across different job executions

Incorrect Document Id while Seeding Documents

Hi guys,

While adapting the connector to last version of Manifold (aka Manifold 1.7-SNAPSHOT), we have faced the following problem:

In the addSeedDocuments method of the connector, you are seeding the alfresco documents using the JSON obtained from fetchNodes method of the Alfresco client as DocumentId instead of only the uuid. This JSON, apart from the uuid, contains more information that is useful later in the processDocuments method.

Now, in the processDocuments method, documents are being injected using only the uuid as document ID. This is not longer valid in this version of Manifold, because now documents are being processed in a pipeline where the original document ID is checked. So, probably, the JSON is being stored in the database after seeding the documents and when the documents are ingested with a difference ID, an exception is raised.

I understand that you use the whole JSON for seeding because of performance to avoid calling Alfresco one more time per document in the processDocument method. It would be nice if the seeding method would allow to include additional metadata, but currently this is not possible.

So, I'm afraid that the workflow for this version needs to change and now a first call to Alfresco has to be done to fetch only UUIDs and later fetch the rest of Node information (one call per document). This is, of course, less efficient but it is the cleanest way rather than store in memory Node information in the connector for seeding documents until those are ingested.

Because this is a major change, I would probably suggest you guys to open a branch for version 1.7. I will be happy to contribute the changes :-)

Add aspects (and more) to document processing

At the moment AlfrescoConnector.processDocuments() only adds properties to the Manifold RepositoryDocument.

The following elements should be added in order to extend search functionalities:

  • Aspects
  • Path
  • Primary Type

The field format (i.e. namespace, date formatting, ...) should be consistent with the standards used by Alfresco Solr distribution.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.