Code Monkey home page Code Monkey logo

azure-cosmos-db-mongo-utils's Introduction

azure-cosmos-db-mongo-utils

Public repo for Cosmos DB Mongo Utilities


Functionality

pyapp subproject core functionality

This functionality has no dependency on customer source/target Excel mapping files.

  • Delete/Define the MMA output directory
    • See pyapp/recreate_mma_output_directory.xml
  • Collect/aggregate verbose MMA outputs
    • See pyapp/collect_mma_outputs.ps1 and pyapp/collect_mma_outputs.sh
  • Generate a single Excel Report from many MMA executions
    • See pyapp/migration_wave_report.ps1 and pyapp/migration_wave_report.sh
  • pyapp/verify.py - verify document counts and indices of target vs source DBs and collections
  • pyapp/indexes.py - extract and compare source vs target indexes

pyapp subproject extended functionality

This functionality depends on a customer-specific source/target Excel mapping files.

  • Parse Excel file into JSON data files
    • See pyapp/read_parse_clusters_info_excel_file.ps1 and pyapp/read_parse_clusters_info_excel_file.sh
  • Generate MMA execution scripts from this Excel data
    • See pyapp/generate_mma_execution_scripts.ps1 and pyapp/generate_mma_execution_scripts.sh
    • Some customers have hundreds of clusters, so automation enables efficiency and accuracy
  • Normalize the contents of a customer-created MMA zip file
    • See pyapp/normalize_mma_zip_example.xml
    • Then unzip the normalized zip file into your MMA output directory
  • Excel reporting on the captured MMA outputs
    • See pyapp/migration_wave_report.ps1
    • See pyapp/migration_wave_report.ps1
    • Calculates migration and post-migration RU settings for each collection/container
    • Integrates the various MMA assessments into the report (indexing, shards, etc)
    • Integrates mongodb-docscan (see above repo) into the report (wip)
    • Also creates PostgreSQL sql/csv files for the captured MMA data
  • throughput.py - display and reduce Cosmos DB Request Unit (RU) settings post-migration

Related GitHub Repositories

MongoMigrationAssessment.exe (i.e. - MMA)

Docscan

MongoDB Data Generator

Original Batch/Transforming Migration Process


Directory Structure of this Repo

├── changestream_consumer     <-- coming soon; CosmosDB Mongo API or MongoDB change-stream consumer, implemented in Java,
|                                 code is currently in a private repo
│
├── changestream_producer     <-- coming soon; producer of DB activity for the above changestream_consumer, implemented in Java,
|                                 code is currently in a private repo
│
├── mongodb_docscan           <-- work-in-progress; a MongoDB large document scanner, implemented in Java
│
└── pyapp                     <-- Most Python and Ant scripts you'll execute are here; Python app root directory
    ├── artifact_examples
    │   └── bicep_examples
    ├── artifacts             <-- generated code artifacts; this is not fully implemented
    │   ├── bicep
    │   └── spark
    ├── current               <-- application state files, git-ignored
    │   ├── docscan           <-- unzipped output of mongodb-docscan program
    │   ├── mmaout            <-- redirected output of the MMA program
    │   └── psql              <-- generated PostgreSQL csv and scripts
    ├── pysrc                 <-- python application source code
    ├── templates
    ├── tests
    ├── tmp                   <-- temporary files, git-ignored
    └── venv                  <-- python virtual environment directory

Required Software

Required Software - for MMA execution

Note that the MongoMigrationAssessment.exe program must be executed on Windows, but the other functionality on this repo does not require Windows. A common workflow is that the customer executes the MMA in their environment, and then shares the zipped MMA output directory with Microsoft for subsequent analysis.

Required Software - for Ant/xml script execution


Getting Started

Clone this repo

> git clone https://github.com/cjoakim/azure-cosmos-db-mongo-utils
> cd azure-cosmos-db-mongo-utils

Navigate to the Python Application directory, and create the Python Virtual Environment

On Windows

> cd pyapp
> .\create_venv_setup.ps1

on Linux or macOS

$ cd pyapp
$ ./create_venv_setup.sh

Edit your verify.json configuration file

In the pyapp directory, copy file verify-example.json to verify.json. File verify.json is intentionally git-ignored.

The format of this file is self-explanitory. You can have multiple keys in the file, and key values are used as command-line arguments for some python scripts.

For example, verify.json can look like this:

{
  "migration1": {
    "cluster": "1-US-DEV (SOMETHING)",
    "source": "mongodb+srv://mongodb-source1...",
    "target": "mongodb://cosmosdb-target1...",
    "databases": [],
    "collections": []
  },
  "migration2": {
    "cluster": "1-US-UAT (SOMETHING)",
    "source": "mongodb+srv://mongodb-source2...",
    "target": "mongodb://cosmosdb-target2...",
    "databases": [
      "customers",
      "sales"
    ],
    "collections": []
  }
}

And these keys and configuration values are used like this:

python verify.py migration1

Edit your mdb-cred.txt file

This is optional; a default value will be used if necessary.

Create a one-line file mdb-cred.txt in directory pyapp/current/cred File mdb-cred.txt is intentionally git-ignored.

Format is :

Example:

chris:superSecr3T

Additional Documentation

See the header comments in each script. For example, in verify.py, indexes.py, and througput.py

azure-cosmos-db-mongo-utils's People

Contributors

cjoakim avatar

Stargazers

Vitor Hugo Campos avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.