Code Monkey home page Code Monkey logo

replication's Introduction

Replication Quality Gate Status Known Vulnerabilities CircleCI

Overview

Replication is the process of creating a copy of a subset of data and storing it on another DDF or ION based System. Data can be pulled from a remote DDF and saved to another DDF or ION system. Metacards produced by replication are marked with a "Replication Origins" attribute and a tag of "replicated". Replication will automatically start transferring data once an admin creates a replication configuration.

Known Issues, Limitations, and Assumptions

Replication is still at an early stage in its lifecycle, so there are a few details that the user should be aware of.

Fanout Proxies

Replicating from a DDF system that is configured as a Fanout Proxy will result in the replication of records from sources configured in that system.

Replicating to a DDF system that is configured as a Fanout Proxy will result in the replication of records only to the fanout and not its sources.

Connected Sources

Replicating from a DDF sytem that is configured with Connected Sources will result in the replication of records from the Connected Sources in addition to any local records.

Derived Resources

Derived resources, from products such as NITFs, will not be replicated.

Docker Compose Deployment

Prerequisites

Replication is deployed as a docker stack in a docker swarm. So, before deploying replication, you need a running docker instance with an initialized swarm. Once you have a swarm running you can configure it for replication with the following steps.

Configuration

The configuration that replication uses needs to be populated in docker config

Config Name Description
replication-spring-config The spring-boot application.yml for replication. Example below.

Example replication-spring-config

logging:
  level: 
  #You can adjust the log levels of a package or class in this section 
    root: INFO
    org.apache.cxf.interceptor.LoggingOutInterceptor: WARN
    org.apache.cxf.interceptor.LoggingInInterceptor: WARN
    javax.xml.soap: ERROR
spring:
  data:
    solr:
    #This is the URL the replication service will use to communicate with solr.
    #As long as you use the docker compose file you won't need to change this.
      host: http://replication-solr:8983/solr 
  profiles.active: Classic
replication:
  #This is the number of seconds between each replication, lower it if you're going to be testing.
  period: 300
  #Timeouts for calls to sites
  connectionTimeout: 30
  receiveTimeout: 60
  #The ID of the local site. All replications will go to/from this site. Direction will be determined 
  #by the type and kind of site being replicated with. This field needs to be set, and a site with
  #this ID needs to be saved before any replication will take place. 
  localSite: some-unique-id-1234
  #The remote sites to handle replication for, remove this to handle replication for all sites.
  sites:
  - site1
  - site2
  
# Exposes metrics
management:
  endpoint:
    metrics:
      enabled: true
    prometheus:
      enabled: true
  endpoints:
    web:
      exposure:
        include: 'prometheus,metrics,health,info'
  metrics:
    export:
      prometheus:
        enabled: true

To create a docker config use the config create command, which uses this syntax: docker config create <CONFIG_NAME> <FILE_DIRECTORY>

Example: docker config create replication-spring-config replication/configs/application.yml

Profiles

Replication can be run with one of two profiles. You can specify which profile to use in the 'spring.profiles.active' property as demonstrated in the example above. "Classic" will use the classic monolithic implementation. "Ion" will use the new scalable, cloud oriented implementation.

Metrics

Replication supports reporting metrics through Micrometer. Prometheus is used as the metrics collection platform. The replication-spring-config provides example configuration for exposing metrics from within the application.

Grafana

A Grafana dashboard grafana-dashboard.json is provided, which can be imported into Grafana.

Secrets

Replication requires certs and ssl configurations in order to talk with remote DDF based systems. This information is stored in docker secrets.

Secret Name Description
replication-truststore A truststore to use for TLS
replication-keystore A keystore for this system to use TLS
replication-ssl SSL properties for TLS including passwords for the truststore and keystore

Example replication-ssl

javax.net.ssl.trustStorePassword=changeit
javax.net.ssl.trustStoreType=jks
javax.net.ssl.keyStorePassword=changeit
javax.net.ssl.keyStoreType=jks
javax.net.ssl.certAlias=localhost

Only the properties that differ from the defaults above need to be specified in replication-ssl

To add a docker secret use the secret create command, which uses this syntax: docker secret create <SECRET_NAME> <FILE_DIRECTORY>

Example: docker secret create replication-truststore replication/secrets/truststore.jks

Running

Running the stack will start up a solr service and the replication service.

docker stack deploy -c docker-compose.yml repsync

Adding Replication Configuration

Replication can be configured by using the Solr rest endpoint.

curl -H "Content-Type: application/json" \
-d @/path/to/json/config/file.json \
http://localhost:8983/solr/<target-core>/update?commitWithin=1000

Types and Kinds of sites

Sites can be of different types and kinds. The type and kind of the remote site (the one that's not the local site) will determine whether the replication is a push, pull, harvest, or both push and pull.

Site Type Site Kind Replication Direction
DDF TACTICAL BIDIRECTIONAL
DDF REGIONAL HARVEST
ION TACTICAL BIDIRECTIONAL
ION REGIONAL PUSH
Directions

Here's how the various replication directions are defined: PUSH - Send information to the remote site to be stored. PULL - Retrieve information from the remote site and store it locally. BIDIRECTIONAL - Perform both a push and a pull. HARVEST - Similar to a pull but harvesting will ignore updates and deletes on replicated information.

Adding Site Example

Create a json file with site descriptions like the example below and you can use the following curl command to save those sites for replication to use. Example sites.json

  [
      {
        "version": 1,
        "id": "some-unique-id-1234",
        "name": "RepSync-Node1",
        "description": "Replication Site 1",
        "url": "https://host1:8993/services/",
        "type": "DDF",
        "kind": "TACTICAL",
        "polling_period": 600000,
        "parallelism_factor": 1
       },
      {
        "version":1,
        "id": "another-unique-id-5678",
        "name": "RepSync-Node2",
        "description": "Replication Site 2",
        "url": "https://host2:8993/services",
        "type": "DDF",
        "kind": "TACTICAL",
        "polling_period": 600000,
        "parallelism_factor": 1
      }
   ]
curl -H "Content-Type: application/json" \
-d @sites.json \
http://localhost:8983/solr/replication_site/update?commitWithin=1000
Adding Replication Filters Example

Create a json file with filter descriptions like the example below and you can use the following curl command to save those filters for replication to use. Example filters.json

   [
      {
        "name":"pdf-harvest",
        "site_id":"remote-site-id",
        "filter":"\"media.type\" like 'application/pdf'",
        "suspended":false,
        "priority": 0,
        "id":"unique-filter-id-98765",
        "version":1
      }
    ]
curl -H "Content-Type: application/json" \
-d @filters.json \
http://localhost:8983/solr/replication_filter/update?commitWithin=1000

Removing Replication Configuration

Example removing all sites with the name 'Test'

curl -X POST \
  'http://localhost:8983/solr/replication_site/update?commit=true' \
  -H 'Content-Type: application/xml' \
  -d '<delete><query>name_txt:Test</query></delete>'

Example removing all sites

curl -X POST \
  'http://localhost:8983/solr/replication_site/update?commit=true' \
  -H 'Content-Type: application/xml' \
  -d '<delete><query>*:*</query></delete>'

Try it out

You can try replication for yourself using the steps below as a high level overview. Details on how to complete steps related to setting up replication can be found above, starting with "Docker Compose Deployment".

  1. Make sure docker is up and running. Start up a docker swarm if you haven't already.
  2. Create docker config
  3. Create docker secrets
  4. Deploy stack
  5. Create two sites, Both as REGIONAL DDFs, with URLs pointing to running DDF instances.
  6. The remote site (The site that isn't the local site) will be your source of data. Upload test data to the source site if it has none.
  7. Create filter. The "site_id" should match the ID of remote site. The filter can be changed to something like ""title" like 'test'" or ""title" like '*'" to replicate everything.
  8. execute docker service logs -f repsync_ion-replication to view the logs and wait for replication to occur. Once you start seeing logs check the local site to see the data start coming in.

replication's People

Contributors

clockard avatar dependabot-preview[bot] avatar kcover avatar paouelle avatar peterhuffer avatar snyk-bot avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

replication's Issues

Add NodeAdapter for ION endpoint

Is your feature request related to a problem? Please describe.
Add an adapter that will allow replication to push products to the new ION endpoint

Describe the solution you'd like
Add an adapter that will support the ION ingest endpoint. The endpoint will only support create so update/delete/query operations will not be supported.

[DepShield] (CVSS 5.9) Vulnerability due to usage of com.google.guava:guava:20.0

Vulnerabilities

DepShield reports that this application's usage of com.google.guava:guava:20.0 results in the following vulnerability(s):


Occurrences

com.google.guava:guava:20.0 is a transitive dependency introduced by the following direct dependency(s):

com.google.guava:guava:20.0

ddf.security.core:security-core-api:2.13.9
        └─ com.google.guava:guava:20.0

replication:replication-api-impl:0.3.0-SNAPSHOT
        └─ com.google.guava:guava:20.0

replication:replication-api-impl:0.3.0-SNAPSHOT
        └─ com.google.guava:guava:20.0

This is an automated GitHub Issue created by Sonatype DepShield. Details on managing GitHub Apps, including DepShield, are available for personal and organization accounts. Please submit questions or feedback about DepShield to the Sonatype DepShield Community.

[DepShield] (CVSS 5.9) Vulnerability due to usage of org.apache.zookeeper:zookeeper:3.4.13

Vulnerabilities

DepShield reports that this application's usage of org.apache.zookeeper:zookeeper:3.4.13 results in the following vulnerability(s):


Occurrences

org.apache.zookeeper:zookeeper:3.4.13 is a transitive dependency introduced by the following direct dependency(s):

org.springframework.data:spring-data-solr:4.0.8.RELEASE
        └─ org.apache.solr:solr-solrj:7.7.1
              └─ org.apache.zookeeper:zookeeper:3.4.13

replication:replication-api-impl:0.3.0-SNAPSHOT
        └─ org.springframework.data:spring-data-solr:4.0.8.RELEASE
              └─ org.apache.solr:solr-solrj:7.7.1
                    └─ org.apache.zookeeper:zookeeper:3.4.13

replication:replication-api-impl:0.3.0-SNAPSHOT
        └─ org.springframework.data:spring-data-solr:4.0.8.RELEASE
              └─ org.apache.solr:solr-solrj:7.7.1
                    └─ org.apache.zookeeper:zookeeper:3.4.13

This is an automated GitHub Issue created by Sonatype DepShield. Details on managing GitHub Apps, including DepShield, are available for personal and organization accounts. Please submit questions or feedback about DepShield to the Sonatype DepShield Community.

Set Owasp threshold to 4

Describe the task
Change the owasp failure threshold to 4 and fix any findings that it produces

Additional context
Add any other context about the task here.

When failure retries are included in csw query no new items are returned

Describe the bug
No new items are returned when querying over csw if the query contains search parameters for previously failed items. The matching failed items are returned however.

The expected behavior is that new items would also be returned when we include failure items in the query as well.

Affects version
0.2.x, 0.3.0

To Reproduce
Steps to reproduce the behavior:

  1. Attempt to replicate a product that will fail
  2. Ingest a new item that matches the filter criteria
  3. attempt to replicate again
  4. notice that the new item has not been replicated as it should be

Expected behavior
products matching the given filter will be replicated even when there are items in the repliction that previously failed.

Internalize replication scheduling

Describe the task
Instead of relying on scheduling command to run replication an internal mechanism should be added to automatically run the replication jobs. This could potentially result in many more replication status entries than before so a way to condense the status objects should also be included.

Add docker distribution for replication

Is your feature request related to a problem? Please describe.
Add docker distribution and deployment options for replication

Describe the solution you'd like
Add a Dockerfile module and a docker-compose.yml to allow docker deployments

Add static analysis for UI code

Describe the task
Static analysis helps increase the quality and readability of the code, as well as reducing bugs. Setup a static analysis tool that runs as part of the build.

Additional context
N/A

Add functionality to convert metacard replication configs to the latest version

Describe the task
Add functionality to the MetacardConfigLoader to convert any version-less and version 2 configs to the current version and save them in the ReplicatorConfigManager. This should all be done on startup. This task should also remove any remaining code in our data model that reflects the old version of configs. For example, Code mentioning replication types or direction, should refactored or removed.

Remove deprecated features

Remove code that is no longer needed. This includes:

  • the CLI code
  • code generating a local node
  • any code refering to a local catalog store
  • legacy data conversion logic

Last run and last success do not auto update in the UI

Describe the bug
The last run and last success dates do not properly update in the UI. This is because the actual date is received as UTC, and it is not changing, but the way we display it changes. React does not re-render components if the data does not change.

Affects version
0.2.2

To Reproduce

  1. Create a replication, let it run, then disable it.
  2. Observe the last run and last success do not update without refreshing the page.

Expected behavior
The last run and last success relative time displays to update automatically.

Desktop (please complete the following information):

  • Browser: chrome

Additional context
One solution is to store the relative time for those 2 dates in the local state of a table row and use a timer to update that state every minute.

Add support for Node metrics

Describe the task
Currently, information like latency and up-time are not captured for a Node. Add support for:
Connectivity (up or down)
Latency
Up-time percentage
Version of node
Machine name
Bandwidth monitoring (transfer rate)

First attempt at replicating an item with Ion fails

Describe the bug
The first attempt to replicate an item to Ion always fails.

Affects version
0.3.0

To Reproduce
Steps to reproduce the behavior:

  • Setup replication with a ddf based system as the source and ion as the destination
  • Observe as the ingest fails and an error with a 500 status code is returned in the logs

Expected behavior
The item should replicate on the first attempt.

MB Transferred displays 0 even though there was resources transferred

Describe the bug
Even if there are bytes transferred, if it is below a MB, the front-end will display 0 MB transferred. This is because the back-end stores this value in a long.

Affects version
0.2.2

To Reproduce

  1. Create a replication that will transfer less than 1 MB.
  2. Observe MB transferred says 0.

Expected behavior
A display of 0.5 MB transferred if 512 bytes were transferred

Node deletion error does not disappear and cannot be dismissed

Describe the bug
When deleting a Node that is being used in a Replication, an error notification is presented but it cannot be dismissed.
The notifications also stack on each other.

Affects version
0.2.0

To Reproduce

  1. Create a Node (2 if the local Node is not available)
  2. Create a Replication between the 2 Nodes
  3. Attempt to delete the node
  4. An error notification should appear but it cannot be dismissed.

Expected behavior
The error message can be dismissed.

Screenshots
N/A

Desktop (please complete the following information):

  • Chrome

Additional context
Switching pages causes the notification to go away.

Add a UI code formatter

Describe the task
A code formatter helps keep consistency across all the code in the project, and reduce conflicts when committing code. Include a formatter as part of the build and fail the build if committed code does not comply.

Additional context
N/A

Implement Create Operation For RepSyncs

Describe the task
RepSyncs are a configuration that define what, when, and where data will be replicated to and from. Currently the create operation is mocked out in the API. The mock should be replaced with a functioning method that will persist RepSyncs to the back-end for later retrieval.

Set up API response tests

Describe the task
Create tests to validate the functionality of the api so far. That is, create tests that confirm the api gives the proper responses when given specific input. The aim of this ticket is not to create all of the necessary tests, but to create enough tests to cover the current functionality of the api so that we can do TDD going forward.

Update DDF and Admin Console versions

Describe the task
Upgrade DDF and admin console dependencies

Additional context
The build will fail without this upgrade because we were depending on a snapshot that no longer exists.

Set up UI tests

Describe the task

Setup testing tools for UI code.

This includes setting up a test runner and a coverage checker, which will fail the build if defined coverage criteria are not met.

Additional context
N/A

[DepShield] (CVSS 7.5) Vulnerability due to usage of com.fasterxml.jackson.core:jackson-databind:2.9.8

Vulnerabilities

DepShield reports that this application's usage of com.fasterxml.jackson.core:jackson-databind:2.9.8 results in the following vulnerability(s):


Occurrences

com.fasterxml.jackson.core:jackson-databind:2.9.8 is a transitive dependency introduced by the following direct dependency(s):

org.springframework.data:spring-data-solr:4.0.8.RELEASE
        └─ com.fasterxml.jackson.core:jackson-databind:2.9.8

replication:replication-api-impl:0.3.0-SNAPSHOT
        └─ org.springframework.data:spring-data-solr:4.0.8.RELEASE
              └─ com.fasterxml.jackson.core:jackson-databind:2.9.8

replication:replication-api-impl:0.3.0-SNAPSHOT
        └─ org.springframework.data:spring-data-solr:4.0.8.RELEASE
              └─ com.fasterxml.jackson.core:jackson-databind:2.9.8

This is an automated GitHub Issue created by Sonatype DepShield. Details on managing GitHub Apps, including DepShield, are available for personal and organization accounts. Please submit questions or feedback about DepShield to the Sonatype DepShield Community.

Configurations can pick up failed items that aren't theirs

Describe the bug
If 2 configurations exist with the same source and destination nodes, it is possible for them to pick up each other's failed items. This is due to the lookup of failed items only taking into account the failure count, source name, and destination name.

See: https://github.com/connexta/replication/blob/master/replication-api-impl/src/main/java/org/codice/ditto/replication/api/impl/SyncHelper.java#L264

Affects version
0.2.2

To Reproduce

  1. Create 2 replications with the same source and destinations, but with different filters.
  2. Replicate on one filter but cause the item to fail.
  3. Run the other configuration. It should attempt to replicate the failed item but fail.

Expected behavior
A configuration should only pick up on its own failed items.

Screenshots
N/A

Desktop (please complete the following information):
N/A

Additional context
N/A

Pull updates from demo development and address owasp issues

A few items were found when working on a replication demo. This ticket is for addressing those items

  • Update the delete command to be more flexible
  • Add suppressions from ddf
  • Rework kar file to include the replication jars and not just the feature file

[DepShield] (CVSS 9.8) Vulnerability due to usage of org.codehaus.groovy:groovy-all:2.4.7

Vulnerabilities

DepShield reports that this application's usage of org.codehaus.groovy:groovy-all:2.4.7 results in the following vulnerability(s):


Occurrences

org.codehaus.groovy:groovy-all:2.4.7 is a transitive dependency introduced by the following direct dependency(s):

org.codice.test:junit-extensions:0.3
        └─ org.codehaus.groovy:groovy-all:2.4.7

org.codice.test:junit-extensions:0.3
        └─ org.codehaus.groovy:groovy-all:2.4.7

org.codice.test:junit-extensions:0.3
        └─ org.codehaus.groovy:groovy-all:2.4.7

org.codice.test:junit-extensions:0.3
        └─ org.codehaus.groovy:groovy-all:2.4.7

org.codice.test:junit-extensions:0.3
        └─ org.codehaus.groovy:groovy-all:2.4.7

org.codice.test:junit-extensions:0.3
        └─ org.codehaus.groovy:groovy-all:2.4.7

org.codice.test:junit-extensions:0.3
        └─ org.codehaus.groovy:groovy-all:2.4.7

org.codice.test:junit-extensions:0.3
        └─ org.codehaus.groovy:groovy-all:2.4.7

org.codice.test:junit-extensions:0.3
        └─ org.codehaus.groovy:groovy-all:2.4.7

org.codice.test:junit-extensions:0.3
        └─ org.codehaus.groovy:groovy-all:2.4.7

org.codice.test:junit-extensions:0.3
        └─ org.codehaus.groovy:groovy-all:2.4.7

org.codice.test:junit-extensions:0.3
        └─ org.codehaus.groovy:groovy-all:2.4.7

org.codice.test:junit-extensions:0.3
        └─ org.codehaus.groovy:groovy-all:2.4.7

org.codice.test:junit-extensions:0.3
        └─ org.codehaus.groovy:groovy-all:2.4.7

This is an automated GitHub Issue created by Sonatype DepShield. Details on managing GitHub Apps, including DepShield, are available for personal and organization accounts. Please submit questions or feedback about DepShield to the Sonatype DepShield Community.

Missing dependency error when deploying replication on older versions of DDF

Describe the bug
When deploying the current replication 0.2.1-SNAPSHOT replication on DDF v2.13.6 or earlier a missing dependency error is thrown for ddf.security.encryption. This happened when replication was updated to ddf 2.13.7-SNAPSHOT where that dependency was fixed to export itself as the project version rather than 1.0.0.
Breaking change is in codice/ddf@2247de6
Affects version
0.2.1-SNAPSHOT
To Reproduce
Build current replication master and deploy the resulting kar to a ddf-2.13.6

Expected behavior
Current replication master should deploy to ddf-2.13.6 without any missing dependencies

Add support for more clean up options when deleting Replications

Describe the task
Add support for better clean up options when deleting Replications from the UI. Currently the CLI allows deleting a Replication's history and replicated items in addition to deleting the configuration itself. This should be possible from the UI as well.

Additional context
N/A

Create a library for testing the API

Describe the task
Create a library that will make it easy to make requests to the replication api and confirm data going through the API is saved properly. This will help us to confirm that data is properly saved and retrieved through API operations.

Replicator thread can stop due to runtime exception

Describe the bug
Replication stops working. When solr is not initially available and the history attempts to save, a runtime exception is thrown that is not caught and interrupts the replicator thread.

See: https://github.com/connexta/replication/blob/0.2.x/replication-api-impl/src/main/java/org/codice/ditto/replication/api/impl/ReplicatorImpl.java#L203

Affects version
0.2.2

To Reproduce
This is a timing issue that only occurs sometimes after startup and a replication configuration runs before solr is available.

Expected behavior
The replicator thread will never stop running.

Remove DDF CatalogStore from internal API

Describe the task
Remove DDF interfaces (CatalogStore, CreateRequests, etc) that are currently being exposed through the internal API.

Additional context
This is part of the move to deploying as a separate application outside of DDF.

As an administrator, I'd like to be able to cancel/suspend/enable replications

Is your feature request related to a problem? Please describe.
There is no control over what replication runs. Adding support for cancel/enable/suspend will provide more control.

Describe the solution you'd like
Ability to suspend, enable, and cancel replications.

Describe alternatives you've considered
N/A

Additional context
N/A

Cleanup replication configurations better

Is your feature request related to a problem? Please describe.
When re-creating a configuration between the same source and destination while testing, it is annoying when old replication items stick around because the metacard ids are still the same. This forces re-ingest to trigger new metacard ids.

Describe the solution you'd like
Cleanup orphaned (without corresponding config or metadata) replication items.

Describe alternatives you've considered
N/A

Additional context
This work was done on 0.2.x. See https://github.com/connexta/replication/blob/0.2.x/replication-api-impl/src/main/java/org/codice/ditto/replication/api/impl/ScheduledReplicatorDeleter.java#L137

Update replication status as items are processed

Describe the task
Right now the status of a replication job is only reported once it is finished. This task is to make it so the status gets updated as each individual piece of the job is completed so anyone looking at the status can see the progress.

Update documentation

Describe the task
Update the documentation to mention that confluence metadata cannot be replicated, and to specify what scenarios require replication to be installed on both the source and destination.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.