aws-samples / amazon-keyspaces-toolkit Goto Github PK

View Code? Open in Web Editor NEW

33.0 8.0 21.0 8.73 MB

Docker Image /tools for working with Amazon Keyspaces.

License: MIT No Attribution

Dockerfile 0.13% Shell 3.30% Python 96.57%

amazon-keyspaces cassandra keyspaces

amazon-keyspaces-toolkit's People

Stargazers

Watchers

amazon-keyspaces-toolkit's Issues

PicklingError: Can't pickle <class 'cqlshlib.copyutil.ImmutableDict'>: attribute lookup cqlshlib.copyutil.ImmutableDict failed

Describe the bug
I'm getting this error when i'm importing CSV into Keyspaces after exporting them and recreated the table.

To Reproduce

Steps to reproduce the behavior:

Connected to Amazon Keyspaces at cassandra.eu-west-1.amazonaws.com:9142.
{cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.

COPY projects
           ... ( project_id, create_date, bucket_name, duration, duration_millis, file_hash, file_size, frame_rate, height, original_file_name, presets, project_name, project_status, public_url, reference_bucket, reference_hash, reference_lang, reference_type, remote_ip, segment_duration, source_lang, source_language_direction, source_language_name, subject, target_languages, total_raw_words, user_id, width )
           ... TO '/tmp/projects.csv';
Reading options from /root/.cassandra/cqlshrc:[copy]: {'maxattempts': '25', 'numprocesses': '16'}
Using 16 child processes

Starting copy of projects with columns [ project_id, create_date, bucket_name, duration, duration_millis, file_hash, file_size, frame_rate, height, original_file_name, presets, project_name, project_status, public_url, reference_bucket, reference_hash, reference_lang, reference_type, remote_ip, segment_duration, source_lang, source_language_direction, source_language_name, subject, target_languages, total_raw_words, user_id, width ].
Processed: 9 rows; Rate:       4 rows/s; Avg. rate:       7 rows/s

9 rows exported to 1 files in 1.266 seconds.

So i dropped the table from Keyspaces console and re-created:

CREATE TABLE matesub.projects
           ... (
           ...     project_id                uuid,
           ...     create_date               timestamp,
           ...     bucket_name               ascii,
           ...     duration                  int,
           ...     duration_millis           int,
           ...     file_hash                 ascii,
           ...     file_size                 int,
           ...     frame_rate                ascii,
           ...     height                    int,
           ...     original_file_name        text,
           ...     presets                   map<ascii, text>,
           ...     project_name              text,
           ...     project_status            text,
           ...     public_url                text,
           ...     reference_bucket          text,
           ...     reference_hash            ascii,
           ...     reference_lang            ascii,
           ...     reference_type            ascii,
           ...     remote_ip                 inet,
           ...     segment_duration          ascii,
           ...     source_lang               ascii,
           ...     source_language_direction ascii,
           ...     source_language_name      text,
           ...     subject                   ascii,
           ...     target_languages          set<ascii>,
           ...     total_raw_words           int,
           ...     user_id                   uuid,
           ...     width                     int,
           ...     PRIMARY KEY (project_id)
           ... ) ;

I tried to re-import data:

COPY projects
           ... ( project_id, create_date, bucket_name, duration, duration_millis, file_hash, file_size, frame_rate, height, original_file_name, presets, project_name, project_status, public_url, reference_bucket, reference_hash, reference_lang, reference_type, remote_ip, segment_duration, source_lang, source_language_direction, source_language_name, subject, target_languages, total_raw_words, user_id, width )
           ... FROM '/tmp/projects.csv';
Reading options from /root/.cassandra/cqlshrc:{copy]: {'maxattempts': '25', 'numprocesses': '16'}
Reading options from /root/.cassandra/cqlshrc:{copy-from]: {'minbatchsize': '1', 'chunksize': '30', 'maxparseerrors': '-1', 'maxinserterrors': '-1', 'ingestrate': '1500', 'maxbatchsize': '10'}
Using 16 child processes

Starting copy of projects with columns { project_id, create_date, bucket_name, duration, duration_millis, file_hash, file_size, frame_rate, height, original_file_name, presets, project_name, project_status, public_url, reference_bucket, reference_hash, reference_lang, reference_type, remote_ip, segment_duration, source_lang, source_language_direction, source_language_name, subject, target_languages, total_raw_words, user_id, width ].

PicklingError: Can't pickle <class 'cqlshlib.copyutil.ImmutableDict'>: attribute lookup cqlshlib.copyutil.ImmutableDict failed
PicklingError: Can't pickle <class 'cqlshlib.copyutil.ImmutableDict'>: attribute lookup cqlshlib.copyutil.ImmutableDict failed
PicklingError: Can't pickle <class 'cqlshlib.copyutil.ImmutableDict'>: attribute lookup cqlshlib.copyutil.ImmutableDict failed
PicklingError: Can't pickle <class 'cqlshlib.copyutil.ImmutableDict'>: attribute lookup cqlshlib.copyutil.ImmutableDict failed
PicklingError: Can't pickle <class 'cqlshlib.copyutil.ImmutableDict'>: attribute lookup cqlshlib.copyutil.ImmutableDict failed
PicklingError: Can't pickle <class 'cqlshlib.copyutil.ImmutableDict'>: attribute lookup cqlshlib.copyutil.ImmutableDict failed
PicklingError: Can't pickle <class 'cqlshlib.copyutil.ImmutableDict'>: attribute lookup cqlshlib.copyutil.ImmutableDict failed
PicklingError: Can't pickle <class 'cqlshlib.copyutil.ImmutableDict'>: attribute lookup cqlshlib.copyutil.ImmutableDict failed
PicklingError: Can't pickle <class 'cqlshlib.copyutil.ImmutableDict'>: attribute lookup cqlshlib.copyutil.ImmutableDict failed
......
......
Processed: 9 rows; Rate:       1 rows/s; Avg. rate:       1 rows/s

9 rows imported from 1 files in 6.917 seconds (0 skipped).

I perform a SELECT after the import:

*select * from projects;*

 project_id | create_date | bucket_name | duration | duration_millis | file_hash | file_size | frame_rate | height | original_file_name | presets | project_name | project_status | public_url | reference_bucket | reference_hash | reference_lang | reference_type | remote_ip | segment_duration | source_lang | source_language_direction | source_language_name | subject | target_languages | total_raw_words | user_id | width
------------+-------------+-------------+----------+-----------------+-----------+-----------+------------+--------+--------------------+---------+--------------+----------------+------------+------------------+----------------+----------------+----------------+-----------+------------------+-------------+---------------------------+----------------------+---------+------------------+-----------------+---------+-------

(0 rows)

Expected behavior

When performing a select after the data import i should see the data in the table.

Environment (please complete the following information):

Docker version 20.10.2, build 2291f61
docker build --tag amazon/keyspaces-toolkit --build-arg CLI_VERSION=latest https://github.com/aws-samples/amazon-keyspaces-toolkit.git
Host Linux
Distributor ID: Ubuntu
Description: Ubuntu 18.04.5 LTS
Release: 18.04
Codename: bionic

Docker build fails

Trying to build using the docker build command fails

docker build --tag amazon/keyspaces-toolkit --build-arg CLI_VERSION=latest https://github.com/aws-samples/amazon-keyspaces-toolkit.git
[+] Building 0.0s (1/1) FINISHED                                                                                                            
 => CACHED [internal] load git source https://github.com/aws-samples/amazon-keyspaces-toolkit.git                                      0.0s
failed to solve with frontend dockerfile.v0: failed to read dockerfile: open /var/lib/docker/tmp/buildkit-mount2497446633/Dockerfile: no such file or directory

Build fails with "no such file or directory"

Cloning from master and running the commands in the readme yields this error:

Step 9/17 : COPY cassandra/LICENSE.txt $CASSANDRA_HOME
COPY failed: stat /var/lib/docker/tmp/docker-builder831823186/cassandra/LICENSE.txt: no such file or directory

cqlsh-expansion: AttributeError - 'NoneType' object has no attribute 'is_up'

Describe the bug
The COPY FROM command produces the following error on a csv file created with COPY TO.

<stdin>:1:Failed to import 30 rows: AttributeError - 'NoneType' object has no attribute 'is_up',  given up after 1 attempts

To Reproduce
Steps to reproduce the behavior:

Install cqlsh-expansion as described here. https://docs.aws.amazon.com/keyspaces/latest/devguide/programmatic.cqlsh.html
Dump data to csv

cqlsh-expansion cassandra.us-west-2.amazonaws.com 9142 --ssl -e "COPY keyspace1.table1 TO './dump.csv' WITH HEADER='true';"

Create new keyspace/table, matching the source table

CREATE KEYSPACE IF NOT EXISTS "keyspace2"
  WITH REPLICATION = {'class':'SingleRegionStrategy'};

CREATE TABLE IF NOT EXISTS keyspace2.table2 (
    col1 text,
    col2 text,
    col3 text,
    created_at timestamp,
    my_data blob,
    PRIMARY KEY (col1, col2, col3)
) WITH CLUSTERING ORDER BY (col2 ASC, col3 ASC)
    AND bloom_filter_fp_chance = 0.01
    AND comment = ''
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 7776000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 3600000
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

Import data

cqlsh-expansion cassandra.us-west-2.amazonaws.com 9142 --ssl -e "CONSISTENCY LOCAL_QUORUM; COPY keyspace2.table2 FROM './dump.csv' WITH HEADER='true';"

Observe errors

Consistency level set to LOCAL_QUORUM.
cqlsh current consistency level is LOCAL_QUORUM.
Reading options from /home/ubuntu/.cassandra/cqlshrc:[copy]: {'numprocesses': '16', 'maxattempts': '1000'}
Reading options from /home/ubuntu/.cassandra/cqlshrc:[copy-from]: {'ingestrate': '1500', 'maxparseerrors': '1000', 'maxinserterrors': '-1', 'maxbatchsize': '10', 'minbatchsize': '1', 'chunksize': '30'}
Reading options from the command line: {'header': 'true'}
Using 16 child processes

Starting copy of keyspace2.table2 with columns [col1, col2, col3, created_at, my_data].
<stdin>:1:Failed to import 30 rows: Error - field larger than field limit (999999),  given up after 1 attempts
<stdin>:1:Failed to import 30 rows: AttributeError - 'NoneType' object has no attribute 'is_up',  given up after 1 attempts
<stdin>:1:Failed to import 30 rows: AttributeError - 'NoneType' object has no attribute 'is_up',  given up after 1 attempts
<stdin>:1:Failed to import 30 rows: AttributeError - 'NoneType' object has no attribute 'is_up',  given up after 1 attempts
<stdin>:1:Failed to import 30 rows: Error - field larger than field limit (999999),  given up after 1 attempts
<stdin>:1:Failed to import 30 rows: Error - field larger than field limit (999999),  given up after 1 attempts
<stdin>:1:Failed to import 30 rows: AttributeError - 'NoneType' object has no attribute 'is_up',  given up after 1 attempts
<stdin>:1:Failed to import 30 rows: AttributeError - 'NoneType' object has no attribute 'is_up',  given up after 1 attempts
...
Processed: 14105 rows; Rate:     212 rows/s; Avg. rate:     185 rows/s
0 rows imported from 1 files in 0 day, 0 hour, 1 minutes, and 16.140 seconds (0 skipped).

Expected behavior
I expect the import to complete successfully without errors.

Screenshots
n/a

Environment (please complete the following information):

Host OS: Ubuntu 22.04
AWS Keyspaces
cqlsh-expansion: 6.1.0

Additional context
I'm just trying to do a simple export/import.

(P.S. Appologies if this is the wrong repo to report 'cqlsh-expansion' bugs.)

Schema migration tool for Amazon Keyspaces

We are in the middle of migration to Amazon Keyspaces. I would like to know what is the best way to manage future Schema Migrations with Amazon Keyspaces?

It would be great if Amazon can provide with a tool that can mount a set of migration scripts and applies them incrementally.
Something like described https://medium.com/cobli/the-best-way-to-manage-schema-migrations-in-cassandra-92a34c834824

Add support for SigV4 authentication plugin

Hi, please could you add some notes to readme about how to use the SigV4 authentication plugin?
Thanks

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.