Comments (4)
Hi, thanks for trying it!
Yep, the config structure is fine (the cache files are just supplementary files Python creates in runtime, don't worry about them!
No worries, I can see it's confusing at the moment! I'll clean up the module a bit and generate documentation tomorrow, in the meantime you can set up and run the exporter for github data, which HPI is using as inputs.
from hpi.
Ok, I got it to work, sorta.
- Download https://github.com/karlicoss/ghexport
- Use it to create an export json file. Place it in /home/me/data/exports/github
- Use the github export button in settings to make a huge archive, then unzip in /home/me/data/exports/github_initial
- Make a blank cache dir /home/me/.config/my/my/config/cache
- In
~/.config/my/my/config/repos
,git clone https://github.com/karlicoss/ghexport.git
Now set up
__init__.py
class github:
export_dir = "/home/me/data/exports/github"
gdpr_dir = "/home/me/data/exports/github_initial"
cache_dir = "/home/me/.config/my/my/config/cache"
Now I can run python -m orger.modules.github
to make a Github.org
.
The only issue is I ended up having to point gdpr_dir
to a blank directory. Apparently something in my export makes the thing crash (I guess this could be another issue if you want).
File "/home/me/.local/lib/python3.8/site-packages/orger/modules/github.py", line 11, in get_items
for e in gh.get_events():
File "/home/me/.local/lib/python3.8/site-packages/my/coding/github.py", line 258, in get_events
return sorted(iter_events(), key=lambda e: e.dt)
File "/home/me/.local/lib/python3.8/site-packages/my/coding/github.py", line 258, in <lambda>
return sorted(iter_events(), key=lambda e: e.dt)
AttributeError: 'RuntimeError' object has no attribute 'dt'
Edit: I found it. All I had to do was get rid of organizations_000001.json
. That was breaking it for some reason. Here's what it looks like:
[
{
"type": "organization",
"url": "https://github.com/dependabot",
"login": "dependabot",
"name": "Dependabot",
"description": "Automated dependency updates",
"website": "https://dependabot.com",
"location": "London",
"email": "[email protected]",
"members": [
],
"owners_team": null,
"webhooks": [
],
"created_at": "2017-04-12T11:03:37Z"
}
]
from hpi.
Right, I'm back!
Thanks for organizations example, and yeah, good job -- you figured all this out!
But I'll explain anyway:
-
The
RuntimeError
thing
It is actually an error handling technique, but I forgot to handle it properly during the sort. Looks a bit cryptic, but my hope is that once you get used to the pattern ('attribute error on the exception'), it's pretty clear what's the problem (at least for the developer).The idea is that errors are propagated all the way up (instead of immediately crashing, or silently ignoring them), and downstream applications can decide what to do with them. For example, orger now displays it as an 'error' heading, so the user is aware when/if errors happen. I guess I'll add some flag to control error display in org-mode files -- naturally I end up with very few as I'm the developer, but I can imagine it might flood files for some people :)
As for the module itself: I added the docs for github modules, but will elaborate anyway:
-
cache_dir
You can actually keep it asNone
if you prefer, then it would just use/tmp
. Although I'm not 100% sure this behaviour is settled -- feels like it makes more sense to mean 'caching is disabled', and to use user cache dir by default. -
gdpr_dir
I added support for empty string (
''
) as path -- that way it will be simply ignored, so there won't be need for empty directory! -
export_dir
I renamed to
export_path
(it's more consistent with the other modules), but the old name works too.
And in case you're curious: there are quite a few changes in #61 -- what I did is splitting the module into my.github.gdpr
and my.github.ghexport
bits. That way it's a bit more modular, makes it easier to use only one of sources if someone prefers. It's similar to the logic for twitter that I documented before. But the old my.coding.github
is still there and should be backwards compatible.
from hpi.
Ok, great. Thanks for the detailed explanation and I'll give the new modules a shot when they're released.
from hpi.
Related Issues (20)
- configuring `all` modules: catching AttributeErrors on missing blocks? HOT 3
- smscalls: parse mms data from xml export HOT 2
- Possible feature: Parse binary data using Kaitai Struct HOT 2
- Social Media - Aggregate Updates HOT 3
- docs: add instructions on how to setup google_takeout_parser
- Email history HOT 5
- location fallback system HOT 12
- find alternative to dataset library?
- add semantic location history to my.location.google_takeout HOT 1
- improve usability/interface for photos module, use extracted geolocation data in location fallback HOT 3
- smscalls: make model stricter HOT 1
- allow user to add flag to bypass PEP 668 HOT 1
- HPI_LOGS envvar no longer works HOT 7
- --parallel flag can be flaky at times HOT 4
- possible duplicate logs due to caching logger along with the logging level HOT 9
- google.takeout.parser: deprecate configuration to disable caching individual exports
- switch HPI to src/ layout HOT 1
- hpi query improvement ideas HOT 1
- google takeout parser: support multiple locales when detecting paths HOT 1
- core: migrate to platformdirs
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hpi.