Code Monkey home page Code Monkey logo

refine-client-py's People

Contributors

armisael avatar paulmakepeace avatar vad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

refine-client-py's Issues

Facet's "Invert" button should always be visible

A facet's invert button is only visible if items from that facet are selected (i.e. if the facet is currently active). invert should be visible, and clickable, at all times, so that one can:

  • see more easily whether the facet is in "include" or "exclude" mode (i.e. what clicking on one of its items will do)
  • change toggle that state without having to select an item first

Create New Project in Open Refine using JSON or XML

I'm wondering if there are any examples of python code that uses Open Refine to create a new project using JSON or XML files. We are currently running the python code from:
https://github.com/maxogden/refine-python

This works perfectly for creating project using csv or tsv files. It was after the fact that we've discovered: https://github.com/PaulMakepeace/refine-client-py/.

Reviewing that code, it looks like the project format plays a big part here. However there are 2 points that we can't get past:

  1. Line 151 in https://github.com/PaulMakepeace/refine-client-py/blob/master/google/refine/refine.py
    list different project formats, but I didn't see the JSON option. Is there one?

  2. Line 190 is where we started with created an new project from XML, but didn't see a way to select a specific xml element to start with, similar to what you can do in the desktop version.

Love what you'll are doing here, and want to expand to other formats. Are there any examples out there or advice? thanks!

export to stdout does not support unicode chars

create project from clipboard:

🔣	code	meaning
🍇	1F347	GRAPES
🍉	1F349	WATERMELON
🍒	1F352	CHERRIES
🍓	1F353	STRAWBERRY
🍍	1F34D	PINEAPPLE

export:

$ python2 refine.py --export 2165167439768
������	code	meaning
������	1F347	GRAPES
������	1F349	WATERMELON
������	1F352	CHERRIES
������	1F353	STRAWBERRY
������	1F34D	PINEAPPLE

new_project fails for xls, xlsx and ods with OpenRefine >=2.8

OpenRefine 2.8 introduced a new feature for selecting sheets in the importer.

OpenRefine 2.7:

{
  "sheets": [
    0
  ]
}

OpenRefine 2.8:

{
  "sheets": [
    {
      "name": "duplicates.xls#duplicates",
      "fileNameAndSheetIndex": "duplicates.xls#0",
      "rows": 11,
      "selected": true
    }
  ]
}

Calling the function new_project() with the new sheet option fails. Project will be created but contains 0 rows and thus throws an exception KeyError: 'keyColumnName'

In:

from google.refine import refine
server1 = refine.Refine('http://localhost:3333')
project1 = server1.new_project(
    project_file='data/cli/duplicates.xls',
    project_format='binary/text/xml/xls/xlsx',
    sheets=[{
        'name': 'duplicates.xls#duplicates',
        'fileNameAndSheetIndex': 'duplicates.xls#0',
        'rows': 11,
        'selected': True,
    }]
)

Out:

KeyError                                  Traceback (most recent call last)
<ipython-input-16-4ce682cb870d> in <module>()
----> 1 project1 = server1.new_project(project_file='data/cli/duplicates.xls', project_format='binary/text/xml/xls/xlsx', sheets=[{"name":"duplicates.xls#duplicates","fileNameAndSheetIndex":"duplicates.xls#0","rows":11,"selected":True}])

/home/felix/.local/lib/python2.7/site-packages/google/refine/refine.pyc in new_project(self, project_file, project_url, project_name, project_format, encoding, separator, ignore_lines, header_lines, skip_data_lines, limit, store_blank_rows, guess_cell_value_types, process_quotes, store_blank_cells_as_nulls, include_file_sources, **opts)
    277         if 'project' in url_params:
    278             project_id = url_params['project'][0]
--> 279             return RefineProject(self.server, project_id)
    280         else:
    281             raise Exception('Project not created')

/home/felix/.local/lib/python2.7/site-packages/google/refine/refine.pyc in __init__(self, server, project_id)
    354         self.column_order = {}  # map of column names to order in UI
    355         self.rows_response_factory = None   # for parsing get_rows()
--> 356         self.get_models()
    357         # following filled in by get_reconciliation_services
    358         self.recon_services = None

/home/felix/.local/lib/python2.7/site-packages/google/refine/refine.pyc in get_models(self)
    400             self.column_order[name] = i
    401             column_index[name] = column['cellIndex']
--> 402         self.key_column = column_model['keyColumnName']
    403         self.has_records = response['recordModel'].get('hasRecords', False)
    404         self.rows_response_factory = RowsResponseFactory(column_index)

KeyError: 'keyColumnName'

Problems with add_column

Hi,

I am writing here even if I am not completely sure where could be the issue; anyway, the problem is:

I am importing several rows from a tsv file and the applying the following to create a new column:

turn Created and resolved toDate()

p.text_transform('Created', expression='value.toDate()')
p.text_transform('Resolved', expression='value.toDate()')

so they can be subtracted

p.add_column('Resolved', 'LeadTime', expression='diff(cells["Resolved"].value, cells["Created"].value, "hours")/24.0')

going on the web interface I can see the LeadTime column, but it is empty.
If I try to perform the same operation on the web interface, I get the same result (null result on the preview), but, surprise, if I restart the refine server, I get the correct behaviour on the preview and the operation works correctly.
Looks more like a refine bug or maybe something makes the server get screwed with previous calls.

Anyone can help?

commands list and export fail for projectnames containing unicode chars

project name: unicodé

export:

$ python2 ./refine.py --export 1774315430083
/usr/lib64/python2.7/urllib.py:1298: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  return ''.join(map(quoter, s))
Traceback (most recent call last):
  File "./refine.py", line 108, in <module>
    refine_project = main()
  File "./refine.py", line 102, in main
    export_project(project, options)
  File "./refine.py", line 76, in export_project
    output.writelines(project.export(export_format=export_format))
  File "/home/felix/git/refine-client-py/google/refine/refine.py", line 432, in export
    url = ('export-rows/' + urllib.quote(self.project_name()) + '.' +
  File "/usr/lib64/python2.7/urllib.py", line 1298, in quote
    return ''.join(map(quoter, s))
KeyError: u'\xe9'

list:

$ python2 ./refine.py --list
Traceback (most recent call last):
  File "./refine.py", line 108, in <module>
    refine_project = main()
  File "./refine.py", line 93, in main
    list_projects()
  File "./refine.py", line 63, in list_projects
    print('{0:>14}: {1}'.format(project_id, project_info['name']))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 6: ordinal not in range(128)

The OpenRefine GUI suggests project names without unicode chars but people are free to override the project name input field.

Exception: HTTP 502 "badgateway"

Hello,

I'm using this API to compute clusters of products and it worked as well as I expected with fingerprint binning method. But when I try to use knn with levenshtein it works only with few registers in a CSV file (something about 2000 ones). In another case I have more registers (10000) and receive Exception: HTTP 502 "badgateway" when I try to cluster them.

In the other hand I got compute their cluster using the web interface.

How can I handle it in the API? There is a limit?

Thanks!

Failing tests for OpenRefine 2.7 to 3.2

As @paulmakepeace commented in #15 our first goal should be

"a working python 3 version that's passing tests and runs correctly in OpenRefine 3.2 with the least amount of shenanigans"

A first step could be a systematic test with all OpenRefine versions. So let's get started...

Test environment

I wrote a bash script to test all different versions in one run: tests.sh

Tested with refine-client-py:master snapshot 2019-08-04

OpenRefine server started with docker images from openjdk (cf. Docker Hub felixlohmeier/openrefine

extended assertions for newer versions in tests/test_refine.py, line 40

- self.assertTrue(self.server.version in ('2.0', '2.1', '2.5'))
+ self.assertTrue(self.server.version in ('2.0', '2.1', '2.5', '2.7', '2.8', '3.0', '3.1', '3.2'))

Results

- means that OpenRefine does not support this java version

2.0 2.1 2.5 2.7 2.8 3.0 3.1 3.2
java6 OK OK OK - - - - -
java7 - - OK FAIL (1) FAIL (1) - - -
java8 - - - FAIL (1) FAIL (1) FAIL (1) / ERROR (4) FAIL (1) / ERROR (4) FAIL (1) / ERROR (3)
java9 - - - - FAIL (1) FAIL (1) / ERROR (4) FAIL (1) / ERROR (4) FAIL (1) / ERROR (3)
java10 - - - - - - - FAIL (1) / ERROR (3)
java11 - - - - - - - FAIL (1) / ERROR (3)
java12 - - - - - - - FAIL (1) / ERROR (3)

FAILs and ERRORs in detail

OpenRefine 2.7 + 2.8: FAIL (1)

same results for 2.7 with java 7 or 8 and for 2.8 with java 7, 8 or 9

FAIL: test_editing (tests.test_tutorial.TutorialTestEditing)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/felix/git/refine-client-py/tests/test_tutorial.py", line 141, in test_editing
    self.assertInResponse('transform on 6067 cells in column Zip Code 2')
  File "/home/felix/git/refine-client-py/tests/refinetest.py", line 52, in assertInResponse
    raise AssertionError('Expecting "%s" in "%s"' % (expect, desc))
AssertionError: Expecting "transform on 6067 cells in column Zip Code 2" in "Text transform on 6958 cells in column Zip Code 2: value.toString()[0, 5]"

If I change the assertions to these values and re-run then other FAILs pop up (one after another):

FAIL: test_editing (tests.test_tutorial.TutorialTestEditing)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/felix/git/refine-client-py/tests/test_tutorial.py", line 165, in test_editing
    self.assertEqual(first_cluster[0]['value'], 'RSCC Member')
AssertionError: u'DPEC Member at Large' != 'RSCC Member'
FAIL: test_editing (tests.test_tutorial.TutorialTestEditing)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/felix/git/refine-client-py/tests/test_tutorial.py", line 166, in test_editing
    self.assertEqual(first_cluster[0]['count'], 233)
AssertionError: 6 != 233
FAIL: test_editing (tests.test_tutorial.TutorialTestEditing)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/felix/git/refine-client-py/tests/test_tutorial.py", line 197, in test_editing
    self.assertEqual(response.facets[0].choices[True].count, 3)
AssertionError: 2 != 3
FAIL: test_editing (tests.test_tutorial.TutorialTestEditing)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/felix/git/refine-client-py/tests/test_tutorial.py", line 199, in test_editing
    self.assertInResponse('3 rows')
  File "/home/felix/git/refine-client-py/tests/refinetest.py", line 52, in assertInResponse
    raise AssertionError('Expecting "%s" in "%s"' % (expect, desc))
AssertionError: Expecting "3 rows" in "Remove 2 rows"

If I change all these assertions to these values then OpenRefine 2.7 and 2.8 would be OK. I have not checked yet whether the new results are plausible.

diff for tests/test_tutorial.py

-        self.assertInResponse('transform on 6067 cells in column Zip Code 2')
+        self.assertInResponse('transform on 6958 cells in column Zip Code 2')
(...)
-        self.assertEqual(first_cluster[0]['value'], 'RSCC Member')
-        self.assertEqual(first_cluster[0]['count'], 233)
+        self.assertEqual(first_cluster[0]['value'], 'DPEC Member at Large')
+        self.assertEqual(first_cluster[0]['count'], 6)
(...)
-        self.assertEqual(response.facets[0].choices[True].count, 3)
+        self.assertEqual(response.facets[0].choices[True].count, 2)
         self.project.remove_rows()
-        self.assertInResponse('3 rows')
+        self.assertInResponse('2 rows')

OpenRefine 3.0: FAIL (1) / ERROR (4)

same results with java 8 or 9

FAIL: see OpenRefine 2.7
With updated assertions there is another Exception (like the ones below)

======================================================================
ERROR: test_editing (tests.test_tutorial.TutorialTestEditing)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/felix/git/refine-client-py/tests/test_tutorial.py", line 195, in test_editing
    response = self.project.compute_facets(facet.StarredFacet(True))
  File "/home/felix/git/refine-client-py/google/refine/refine.py", line 458, in compute_facets
    response = self.do_json('compute-facets')
  File "/home/felix/git/refine-client-py/google/refine/refine.py", line 380, in do_json
    data=data)
  File "/home/felix/git/refine-client-py/google/refine/refine.py", line 103, in urlopen_json
    raise Exception(error_message)
Exception: server error: JSONObject["l"] not a string.

ERRORS: 4 (3x server error: JSONObject["l"] not a string., 1x java.lang.NullPointerException)

ERROR: test_duplicate_detection (tests.test_tutorial.TutorialTestDuplicateDetection)
(...)
File "/home/felix/git/refine-client-py/google/refine/refine.py", line 103, in urlopen_json
    raise Exception(error_message)
Exception: server error: JSONObject["l"] not a string.

ERROR: test_transpose_variable_number_of_rows_into_columns (tests.test_tutorial.TutorialTestTransposeVariableNumberOfRowsIntoColumns)
(...)
File "/home/felix/git/refine-client-py/google/refine/refine.py", line 103, in urlopen_json
    raise Exception(error_message)
Exception: server error: JSONObject["l"] not a string.

ERROR: test_web_scraping (tests.test_tutorial.TutorialTestWebScraping)
(...)
File "/home/felix/git/refine-client-py/google/refine/refine.py", line 103, in urlopen_json
    raise Exception(error_message)
Exception: server error: JSONObject["l"] not a string.

ERROR: test_delete_project (tests.test_refine.RefineTest)
  File "/home/felix/git/refine-client-py/google/refine/refine.py", line 103, in urlopen_json
    raise Exception(error_message)
Exception: server error: java.lang.NullPointerException

OpenRefine 3.1: FAIL (1) / ERROR (4)

same results with java 8 or 9

FAIL: see OpenRefine 2.7
With updated assertions there is another Exception (a new one...)

======================================================================
ERROR: test_editing (tests.test_tutorial.TutorialTestEditing)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/felix/git/refine-client-py/tests/test_tutorial.py", line 196, in test_editing
    self.assertEqual(len(response.facets[0].choices), 2)    # true & false
  File "/home/felix/git/refine-client-py/google/refine/facet.py", line 207, in __getitem__
    assert self.facets[index].name == engine.facets[index].name
IndexError: list index out of range

ERRORS: 4 (4x java.lang.NullPointerException)

ERROR: test_duplicate_detection (tests.test_tutorial.TutorialTestDuplicateDetection)
(...)
  File "/home/felix/git/refine-client-py/google/refine/refine.py", line 103, in urlopen_json
    raise Exception(error_message)
Exception: server error: java.lang.NullPointerException

ERROR: test_transpose_variable_number_of_rows_into_columns (tests.test_tutorial.TutorialTestTransposeVariableNumberOfRowsIntoColumns)
(...)
  File "/home/felix/git/refine-client-py/google/refine/refine.py", line 103, in urlopen_json
    raise Exception(error_message)
Exception: server error: java.lang.NullPointerException

ERROR: test_web_scraping (tests.test_tutorial.TutorialTestWebScraping)
(...)
  File "/home/felix/git/refine-client-py/google/refine/refine.py", line 103, in urlopen_json
    raise Exception(error_message)
Exception: server error: java.lang.NullPointerException

ERROR: test_delete_project (tests.test_refine.RefineTest)
(...)
  File "/home/felix/git/refine-client-py/google/refine/refine.py", line 103, in urlopen_json
    raise Exception(error_message)
Exception: server error: java.lang.NullPointerException

OpenRefine 3.2: FAIL (1) / ERROR (3)

same results with java 8, 9, 10, 11 or 12

FAIL: see OpenRefine 2.7
With updated assertions there is another Exception (like the ones below)

ERROR: test_editing (tests.test_tutorial.TutorialTestEditing)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/felix/git/refine-client-py/tests/test_tutorial.py", line 146, in test_editing
    response = self.project.compute_facets()
  File "/home/felix/git/refine-client-py/google/refine/refine.py", line 459, in compute_facets
    return self.engine.facets_response(response)
  File "/home/felix/git/refine-client-py/google/refine/facet.py", line 231, in facets_response
    return FacetsResponse(self, response)
  File "/home/felix/git/refine-client-py/google/refine/facet.py", line 211, in __init__
    self.mode = facets['mode']
KeyError: 'mode'

ERRORS: 3 (different ones than above! 2x KeyError: 'mode', 1x TypeError: coercing to Unicode: need string or buffer, NoneType found)

ERROR: test_facet (tests.test_tutorial.TutorialTestFacets)
(...)
  File "/home/felix/git/refine-client-py/google/refine/facet.py", line 211, in __init__
    self.mode = facets['mode']
KeyError: 'mode'

ERROR: test_transpose_fixed_number_of_rows_into_columns (tests.test_tutorial.TutorialTestTransposeFixedNumberOfRowsIntoColumns)
(...)
  File "/home/felix/git/refine-client-py/google/refine/facet.py", line 211, in __init__
    self.mode = facets['mode']
KeyError: 'mode'

ERROR: test_delete_project (tests.test_refine.RefineTest)
(...)
  File "/home/felix/git/refine-client-py/google/refine/refine.py", line 102, in urlopen_json
    response.get('message', response.get('stack', response)))
TypeError: coercing to Unicode: need string or buffer, NoneType found

Next steps

These different errors needs debugging and eventually deviating code (and/or tests) for different versions of OpenRefine if we want to ensure backwards compatibility. Not sure where to begin...

keyError for Windows with OR 2.5

I just wanted to double check if this is happening to anyone else. I'm getting a keyError on the columns after a new_project call with a project_url parameter. I've tried all combinations I could think of including OR 2.5 (latest) and OR 2.6 with refine-client-py master, back to master 1a4f00b , and the pypi install in Windows 2008 and Ubuntu 12.04. All give the same result of keyError keyColumnName with any dataset I throw at it. The error is shown below. The output from OR is shown below that. When I open the OR client the browser, the project is created but there are no data in it. If this is an error that can be replicated by other people, then I wonder if the create-import-job with using the importing-controller wouldn't be a better approach over create-project-from-upload as shown here (https://groups.google.com/forum/#!topic/openrefine-dev/oh5Cic1XVcI). Any thoughts would be appreciated.

>>> refiner.new_project(project_url="https://raw.githubusercontent.com/PaulMakepeace/refine-client-py/master/tests/data/duplicates.csv")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Administrator\pythonex\python27_64\lib\site-packages\google\ref
ine\refine.py", line 279, in new_project
    return RefineProject(self.server, project_id)
  File "C:\Users\Administrator\pythonex\python27_64\lib\site-packages\google\ref
ine\refine.py", line 356, in __init__
    self.get_models()
  File "C:\Users\Administrator\pythonex\python27_64\lib\site-packages\google\ref
ine\refine.py", line 407, in get_models
    self.key_column = column_model['keyColumnName']
KeyError: 'keyColumnName'
13:38:31.753 [                   refine] POST /command/core/create-project-from-
upload (508ms)
13:38:31.815 [                   refine] GET /command/core/get-models (62ms)

folder name google

The folder name google is making pip3 error. Please change the name of the folder to something more specific like refine_api.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.