dynamicgenetics / epicosm_legacy Goto Github PK
View Code? Open in Web Editor NEWPlease ignore this repo. It's just hanging around for quick access to some resources I needed. Will delete soon :) 14/9/21
License: GNU General Public License v3.0
Please ignore this repo. It's just hanging around for quick access to some resources I needed. Will delete soon :) 14/9/21
License: GNU General Public License v3.0
Use argparser to make argument handling a bit more pro.
currently it goes "is mongodb running?" ok. when we want "if mongodb is running where?" "if running db is not this folder, give warning". doing crazy stuff like
existing_mongodb_dbpath = subprocess.check_output(["ps", "ax", "|",
"grep", "-v", "awk", "|",
"awk", "'{for(i=1;", "i<=NF;", "i++)",
"if($i~/mongod/)", "print", "$(i+2)}'"])
doesn't work as | are seen as literals by shell.
There is lots of repetition in src/modules/twitter_ops > get_tweets()
It kept breaking when I moved the api call line into its own function, and I was too dim to work out why.
Needs cleaning, but for now it works at least. But yes, ugly as hell :/
Some systems run script thinks that Docker is running when it is not. Fix.
stop_mongodb includes a one minute timeout, in case it gets stuck in an infinite loop. is there a better way of doing that? might closing mongod take a long time if the db is very large, and go over this limit?
How to leave out credentials and user_list?
User list seems fine, but credentials is not. Is this because credentials is a module while user_list is a file? if so, go back to brining credentials in as a file? or is there a smarter way to deal with this?
that old retweet field thing - check modules for correct field.
count the real dict? set?
Retweets are recovered truncated - this is designed behaviour, because rts full text is stored in a different field. This is quite well documented, but requires messing with code that I don’t full understand. So, instead we are going to change mongoexport to have e conditional:
if the record does not have the field “retweeted_status”, then it is not a retweet so just get the field “full_text”.
if the record DOES have “retweeted_status”, the actual full text of the tweet is in the field:
"retweeted_status" : {“full_text”]
Having conditional query in mongoexport is being complicated and not working.
Just exporting the retweeted_status.full_text field leaves blank the tweets which were not retweet, as expected.
Solutions:
get conditional working
get mongodb to move the rt_fulltext field to fulltext field (feels dangerous, and breaks format with true tweet format)
make two output files and merge them
many other ways too.
need
"got groundtruth but user not in db"
and
"groundtruth added: this many users were not in the groundtruth" and make file of those users.
With the v2API, we can now harvest complete timelines. However, there is usually a discrepancy between the total tweet count retrieved, and the total that the Twitter website claims someone has posted. This is usually between 1 and 10% of tweets.
This is discussed in the community and is known about, I think the issue is that some tweets are deleted, or retweets from accounts that have been made private, or other edge-cases. So, for now I am leaving it because it seems a common issue, and is only minor.
Can this be dealt with? Needs researching and attempt at fixing.
also documentation.
need toggle inside and recompile.
Get brew to downdate oppenssl?
# or
brew switch openssl 1.0.2r
# or
brew switch openssl 1.0.2s
# or
brew switch openssl 1.0.2t```
or try to pakage up openssl 1.0.09 somehow?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.