Code Monkey home page Code Monkey logo

pylodstorage's Introduction

pyLoDStorage

python List of Dict (Table) Storage library

pypi Github Actions Build PyPI Status GitHub issues GitHub closed issues API Docs License

What it is

pyLoDStorage allows to store table like data (List of Dicts) via

  • Sqlite3
  • JSON
  • SPARQL

Installation

pip install pylodstorage

Get Sources

git clone https://github.com/WolfgangFahl/pyLoDStorage
cd pyLodStorage
scripts/install

Testing

scripts/test

Usage

see test cases

Documentation

Wiki

Authors

pylodstorage's People

Contributors

wolfgangfahl avatar tholzheim avatar musaabkh avatar

Stargazers

Heinz-Alexander Fuetterer avatar  avatar mice.lu avatar David Andreoletti avatar Nizo Priskorn avatar mathMakesArt avatar

Watchers

James Cloos avatar  avatar  avatar

pylodstorage's Issues

need to fix round-trip json behavior

def check(self,manager,manager1,listName,debugLimit):
        self.dumpListOfDicts(manager.__dict__[listName], debugLimit)
        self.dumpListOfDicts(manager1.__dict__[listName], debugLimit)
        self.assertEqual(manager.__dict__,manager1.__dict__)    

should work as well as

 def testRoyals(self):
        '''
        test Royals example
        '''
        royals1=Royals(load=True)
        self.assertEqual(4,len(royals1.royals))
        json=royals1.toJSON()
        print(json)
        types=Types.forClass(royals1, "royals")
        royals2=Royals()
        royals2.fromJson(json,types=types)
        self.assertEqual(4,len(royals2.royals))
        print(royals1.royals)
        print(royals2.royals)
        self.assertEqual(royals1.royals,royals2.royals)

integrate tabulate

tabulate has nice functions that fit the pyLODStorage approach e.g. wikidata and latex table creation.
Refactor to use that library.

add append option to store API

with append=True it should be possible to append data to an existing table - the internal API has withDrop and withCreate for the SQLDB case.

getLookup fails if value is list

ERROR: testEventCorpusFromWikiUser (tests.test_EventCorpus.TestEventCorpus)
test the event corpus
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/wf/Documents/pyworkspace/OpenResearch/migration/tests/test_EventCorpus.py", line 45, in testEventCorpusFromWikiUser
    eventCorpus=Corpus.getEventCorpusFromWikiAPI(debug=self.debug, force=True)
  File "/Users/wf/Documents/pyworkspace/OpenResearch/migration/tests/corpusfortesting.py", line 53, in getEventCorpusFromWikiAPI
    eventCorpus.fromWikiUser(wikiUser,force=force)
  File "/Users/wf/Documents/pyworkspace/OpenResearch/migration/openresearch/eventcorpus.py", line 93, in fromWikiUser
    self.seriesLookup=self.eventList.getLookup("inEventSeries", withDuplicates=True)
  File "/Users/wf/Library/Python/3.9/lib/python/site-packages/lodstorage/jsonable.py", line 373, in getLookup
    return LOD.getLookup(self.getList(), attrName, withDuplicates)
  File "/Users/wf/Library/Python/3.9/lib/python/site-packages/lodstorage/lod.py", line 119, in getLookup
    if value in lookup:
TypeError: unhashable type: 'list'

add try it! button to Query documentation

see also

Examples:

15 Random substances with CAS number
Wikidata SPARQL query showing the 15 random chemical substances with their CAS Number
query

# List of 15 random chemical components with CAS-Number, formula and structure
# see also https://github.com/WolfgangFahl/pyLoDStorage/issues/46
# WF 2021-08-23
SELECT ?substance ?substanceLabel ?formula ?structure ?CAS
WHERE { 
  ?substance wdt:P31 wd:Q11173.
  ?substance wdt:P231 ?CAS.
  ?substance wdt:P274 ?formula.
  ?substance wdt:P117  ?structure.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 15

try it!
result

substance substanceLabel formula structure CAS
Q32703 aminomethylpropanol C₄H₁₁NO 2-amino-2-methyl-1-propanol.svg 124-68-5
Q32703 aminomethylpropanol C₄H₁₁NO Isobutanolamine t.png 124-68-5
Q43656 cholesterol C₂₇H₄₆O Structural formula of cholesterol.svg 57-88-5
Q45143 fulminic acid CHNO Fulminezuur.png 506-85-4
Q49546 acetone C₃H₆O Acetone-2D-skeletal.svg 67-64-1
Q49546 acetone C₃H₆O Acetone-structural.png 67-64-1
Q52858 ethane C₂H₆ Ethan Keilstrich.svg 74-84-0
Q58356 amoxapine C₁₇H₁₆ClN₃O Amoxapine.svg 14028-44-5
Q58713 clomipramine C₁₉H₂₃ClN₂ Clomipramine.svg 303-49-1
Q68484 prucalopride C₁₈H₂₆ClN₃O₃ Prucalopride.svg 179474-81-8
Q68566 mosapride C₂₁H₂₅ClFN₃O₃ Mosapride.svg 112885-41-3
Q80232 cyclobutane C₄H₈ Cyclobutane2.svg 287-23-0
Q80868 tolonium chloride C₁₅H₁₆ClN₃S Tolonium chloride.svg 92-31-9
Q83320 nitric acid HNO₃ Nitric-acid.png 12507-77-6
Q83320 nitric acid HNO₃ Nitric-acid.png 7697-37-2

Ten largest cities of the world
Wikidata SPARQL query showing the 10 most populated cities of the world using the million city class Q1637706 for selection
query

# Ten Largest cities of the world 
# WF 2021-08-23
# see also http://wiki.bitplan.com/index.php/PyLoDStorage#Examples
SELECT DISTINCT ?city ?cityLabel ?population ?country ?countryLabel 
WHERE {
  VALUES ?cityClass { wd:Q1637706}.
  ?city wdt:P31 ?cityClass .
  ?city wdt:P1082 ?population .
  ?city wdt:P17 ?country .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
  }
}
ORDER BY DESC(?population)
LIMIT 10

try it!
result

city cityLabel population country countryLabel
Q1353 Delhi 26495000 Q668 India
Q8686 Shanghai 23390000 Q148 People's Republic of China
Q956 Beijing 21710000 Q148 People's Republic of China
Q1354 Dhaka 16800000 Q902 Bangladesh
Q1156 Mumbai 15414288 Q668 India
Q8660 Karachi 14910352 Q843 Pakistan
Q8673 Lagos 14862000 Q1033 Nigeria
Q406 Istanbul 14657434 Q43 Turkey
Q1490 Tokyo 13942024 Q17 Japan
Q11736 Tianjin 13245000 Q148 People's Republic of China

count OpenStreetMap place type instances
This SPARQL query
determines the number of instances available in the OpenStreetMap for the placeTypes city,town and village

query

# count osm place type instances
# WF 2021-08-23
SELECT (count(?instance) as ?count) ?placeType ?placeTypeLabel
WHERE { 
  VALUES ?placeType {
    "city"
    "town"
    "village"
  }
  ?instance osmt:place ?placeType
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?placeType ?placeTypeLabel
ORDER BY ?count

try it!
result

count placeType placeTypeLabel
13614 city city
23238 town town
153380 village village

SQL handling of Lists

LOD containing a list as value leads to an error

Option to exclude lists of convert them to a single string to avoid the failure would be good

Functionality to convert between LoD and CSV

Example:
CSV input:

pageTitle,name,label
page_1,Test Page 1,1
page_2,Test Page 2,2

LoD output:

[
  {"pageTitle": "page_1", "name": "Test Page 1", "label": "1"},
  {"pageTitle": "page_2", "name": "Test Page 2", "label": "2"}
]

Currently unsure where the type conversion should happen.
The LoD can be converted to JSONAble so the Type conversion could be done there.

Regression: storeToJsonFile and restoreFromJsonFile missing in JSONAble

restore functionality of 0.0.26 release

def storeToJsonFile(self,storeFilePrefix,tableName):
        '''
        store me with the given storeFilePrefix
        
        Args:
            storeFilePrefix(string): the prefix for the JSON file name
            tableName(string): the name of the attribute for which to store the type information
        '''
        JSONAble.storeJsonToFile(self.toJSON(), "%s.json" % storeFilePrefix)
        types=Types.forTable(self, tableName)
        JSONAble.storeJsonToFile(types.toJSON(), "%s-types.json" % storeFilePrefix)
       
 def restoreFromJsonFile(self,storeFilePrefix):
        '''
        restore me from the given storeFilePrefix
        
        Args:
            storeFilePrefix(string): the prefix for the JSON file name
        '''
        jsonStr=JSONAble.readJsonFromFile("%s.json" % storeFilePrefix)
        typesJson=JSONAble.readJsonFromFile("%s-types.json" % storeFilePrefix)
        types=Types(type(self).__name__)
        types.fromJson(typesJson)
        self.fromJson(jsonStr, types)

take the Server Example:

class Server(JSONAble):

isCached should be True even if count is less than 100

The isCached code is currently awkwardly assuming that a table is not cached if it contains only 100 or less entries.
It should be ok to e.g. have an empty table in quite a few use cases.

 sqlQuery="SELECT COUNT(*) AS count FROM %s" % self.tableName
                try:
                    sqlDB=self.getSQLDB(cacheFile)
                    countResult=sqlDB.query(sqlQuery)
                    count=countResult[0]['count']
                    result=count>100
                except Exception as ex:
                    # e.g. sqlite3.OperationalError: no such table: Event_crossref
                    pass      

offer execute wrapper directly via sqlDB

currently to issue command like "CREATE VIEW" a code line like:

sqlDB.c.execute(viewDDL)

is necessary. It's better to offer a wrapper execute that will delegate the call so that for debugging and exception handling the execute calls can be traced and handled

sqlDB.execute(viewDDL)

would then be the recommended way for such calls

add initSqlDB

def initSQLDB(self,sqldb,listOfDicts=None,withCreate:bool=True,withDrop:bool=True,sampleRecordCount=-1):

set None value for undefined LoD entries

 def setNone4List(self,listOfDicts,fields):
        '''
        set the given fields to None for the records in the given listOfDicts
        if they are not set
        Args:
            listOfDicts(list): the list of records to work on
            fields(list): the list of fields to set to None 
        '''
        for record in listOfDicts:
            self.setNone(record, fields)
    
    def setNone(self,record,fields):
        '''
        make sure the given fields in the given record are set to none
        Args:
            record(dict): the record to work on
            fields(list): the list of fields to set to None 
        '''
        for field in fields:
            if not field in record:
                record[field]=None

but make sure a copy of the records is optionally used to keep the original list of dict as is.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.