Code Monkey home page Code Monkey logo

morpheus-core's People

Contributors

benmccann avatar dgunning avatar manoelcampos avatar zavster avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

morpheus-core's Issues

DbSource Performance Fix for Oracle Databases - set the statement fetch size

DataFrame.read().db() is very slow against Oracle databases because the default fetch size for Oracle is 10 records. For example reading 30000 records takes over a minute instead of less than a second.

Set the statement fetch size to be at least 1000 before running the SQL query. Allow the statement fetch size to be set as a DbSourceOption

Remove ZoneInfo usage

It's used here:

src/main/java/com/zavtech/morpheus/array/ArrayType.java:            typeMap.put(sun.util.calendar.ZoneInfo.class, TIME_ZONE);

It's generally considered a best practice to avoid using anything from the sun.* package. E.g. see https://stackoverflow.com/a/1835670

I'm not quite sure whether it'd be possible to remove somehow. ArrayBuilderTests.testWithTimeZones tests this functionality and requires the line to be present, but perhaps we could implement some other way?

The `concat` method doesn't insert all elements when it's called multiple times in sequence

When the concat method is called multiple times, it resizes the destination Array. However, it just sets the content of the last source Array into the next available position in the destination one.

For instance, if it's called array1.concat(array2).concat(array3), just the contents of array3 is in fact inserted into the resulting Array.

The issue can be confirmed by the test cases introduced in PR #91.

Add support for creating sliding window of frames

A better way of doing this...

var windowSize = 5;
var rowKeys = Range.ofLocalDates("2014-01-01", "2014-01-11");
var colKeys = Range.of(0, 5).map(i -> "Column-" + i);
var frame = DataFrame.ofDoubles(rowKeys, colKeys, value -> Math.random() * 10d);
IntStream.range(windowSize-1, frame.rowCount()).mapToObj(lastRow -> {
    var startRow = lastRow - windowSize;
    return frame.rows().select(row -> row.ordinal() <= lastRow && row.ordinal() > startRow);
}).forEach(window -> {
    ((DataFrame) window).out().print();
});

Adds isEmpty method to DataFrame and DataFrameAxis interfaces

Such interfaces have methods to count the number of elements, but to check if the object is empty one has to always use something like rowCount() == 0, colCount() == 0 or count() == 0.

An isEmpty() method will provide a standard, usual and convenient way to check if the object is empty.

Add setting for Columns size

Hello Morpheus team.
Currently I've tried to load csv file with 30_000x2000 features using Morpheus DataFrame, but got stuck into univocity CsvParser maxColumns hard coded to be 10_000.
I think there should be possibility to configure it via options.

Update Univocity Parsers version to 2.X.X

Morpheus uses Univocity Parser library version 1.5.9 which was released in 2015. This cause an issue for using both Morpheus and the latest version of Univocity in the same project.

The latest univocity version also has some changes to the CharAppender that might be useful for helping to implement Excel Parsing.

The Morpheus tests pass for me with univocity version bumped to 2.5.9.

Consider changing version from 0.9.5 to 0.9.5-SNAPSHOT

A snapshot version in Maven is one that has not been released. Before a 0.9.5 release (or any other release) is done, there exists a 0.9.5-SNAPSHOT. That version is what might become 0.9.5. It's basically "0.9.5 under development".

The difference between a "real" version and a snapshot version is that snapshots might get updates. That means that downloading 0.9.5-SNAPSHOT today might give a different file than downloading it yesterday or tomorrow.

The Maven release plugin will automatically change the version to 0.9.5 during the release process

Is this library maintained?

Hello all,

Very curious to know if this library is actively maintained. I do not see any activity recently and the last commit I see is from Feb.

Let me know if you are looking for any contributor/maintainer.

Printer feedback

Why does the method below take a Supplier<String>? It seems like String would be sufficient

public Printer<T> withNullValue(Supplier<String> nullValue) {

Also, Formats.registerPrinters calls .withNullValue(nullValue) on every single printer, which doesn't seem like it should be necessary since DEFAULT_NULL has the same value

How to exclude specific row?

I want to exclude first and last rows from my CSV file before I start to process. Is there any options method for it? There are exclude and include column methods. I could not find for rows. Any idea about it?

Consider fastutil instead of gnu trove

Trove is licensed under the LGPL, which isn't compatible with the Apache 2 license that Morpheus uses. Might not be a problem for all users, but some may wish to exclude Trove from their projects due to the license.

Trove has been around for a while, but it isn't as actively developed anymore. FastUtil is much more recent, and actively developed.

Fastutil also performs much better according to http://java-performance.info/hashmap-overview-jdk-fastutil-goldman-sachs-hppc-koloboke-trove-january-2015/

Is there a way for a missing numeric value to exist?

Columns which contain numbers and also have "NULL" or are empty, are replaced with zeroes. Is there a way to replace those missing values with a more unique value?

What if a missing value means something different than a zero value?

That already happens in columns that contain strings, where "NULL" or empty cells are replaced by null (which I might add, am very grateful for!).

Create DataFrames From Java Collections

Java Developers need a easy way to create Dataframes from in-memory Java Collections. This will Morpheus much more suitable for generic Java development.

I am proposing a class called ListSource or CollectionSource that would be able to create a DataFrame from a List of Lists. E.g. Lets say you read a table from a Word document

XWPFDocument document = new XWPFDocument(stream); XWPFTable table= document.getTables().get(0);

and you convert the table to a lists of Iterables (or lists)

 List<Iterable<XWPFTableCell>> tableData =
                    table.getRows().stream()
                    .map( XWPFTableRow::getTableCells).collect(Collectors.toList());

you could then create a dataframe as follows

  DataFrame<Integer,String> data = new ListSource<XWPFTableCell>()
           .read(options ->{
                options.setData( tableData );
                options.setConverter( XWPFTableCell::getText );
            });

Generally a lot of data in Java can be converted to Lists of Lists and this feature would make Morpheus much more applicable.

Note that the current Morpheus API allows the following

        final Array<String> columns = Array.ofIterable( rows.get(0).getTableCells().stream()
          .map( XWPFTableCell::getText ).collect(toList()));

        return DataFrame.ofObjects(
                Range.of(1, rows.size()).toArray(),
                columns,
                value -> rows.get( value.rowOrdinal()+1).getTableCells().get(value.colOrdinal()).getText());

but that was trickier to get right due to the long method chains and the +1 in the method calls

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.