Comments (10)
@xeniaqian94 You will likely want to use Jersey to make the HTTP requests. You can also see trecrts-tools/trecrts-clients/python/dumb-topic-client/topic_client.py for a simple client in Python.
I'd suggest starting with translating the Python client to Java first and then we can make it more complicated by doing time-filtered search on the index.
from anserini.
@aroegies This commit https://github.com/xeniaqian94/Anserini/commit/a38afb7d5f1790fe69aa0331ee4d4f1f9daf16fd, TweetClientAPI.java
is the simple client where it has,
Thread 1. starts the twitter stream indexer
Thread 2.1. registers to broker,
2.2. retrieves client topics by clientID (POST instead of GET might be better? Since Jersey's GET
API has no arguments that could be passed in?)
2.3 wakes up every 1000ms(configurable later), sends new tweets as <tweetID, topicID, clientID>
interpretable JSON object to broker.
Might need the broker side API for further testing.
from anserini.
@xeniaqian94 The Broker API has been updated to not need a request body in GET requests. See commit 465dd46ed48b997f113ef097a67f4cee6228c267 on trecrts-tools.
You can either launch the broker yourself. Or use the one running at lab.roegiest.com:33334
from anserini.
An example curl call is curl -XPOST -H'Content-Type:application/json' lab.roegiest.com:33334/register/system -d '{"groupid":"uwar"}'
from anserini.
Commit https://github.com/xeniaqian94/Anserini/commit/f9551246846b14d3f4d4c869b48e3ce6789b1a21. Tested with
- lab.roegiest.com:33334/register/system
- lab.roegiest.com:33334/topics/[clientid]
- lab.roegiest.com:33334/tweet/[topid]/[status.id]/[clientid]
For next stage will make it more complicated in time filtered search.
from anserini.
For testing, you will likely want to run
cd target/appassembler/bin/TweetClientAPI -host lab.roegiest.com -port 33334 -interval 0.1
where 0.1 represents push tweets every 0.1 minutes.
from anserini.
Some general comments that should be addressed before a merge:
- groupid as a command line argument
- daily limit as a command line argument
- resourcePath and mustacheTemplatePath are not needed and should be removed
- depending on how fancy we want to get, might be more useful to define the TweetPusherRunnable class separately (i.e. not a nested class) and just pass in a context (i.e., clientid/topic(s)). Allows for more extensibility down the road (e.g., subclassing)
- either use a logger or remove all of the System.out.println's
- Registration failure should generate an exception
- Ideally, more comments would be good if we want people to build upon this (re: subclassing idea)
Other general questions:
- Is it not possible to formulate request bodies (e.g., creating JSON) similar to how the TweetSearchAPI defines a simple class and Jersey is able to convert JSON to an instance of that class)? I'd imagine we should be able to go in the opposite direction.
- We may want to break up the files in nrts/ into subfolders based upon the task (e.g., basicsearcher, livedemo, rts-baseline) or something.
Nothing "mission critical" but I think these should be addressed before being merged
from anserini.
Thanks for the comment. This commit is the new one, https://github.com/xeniaqian94/Anserini/commit/0727d8c9d409f3a73545ac059ec0c8fd55b0470b
new usage cd target/appassembler/bin/TweetClientAPI -host lab.roegiest.com -port 33334 -interval 0.1 -groupid uwar -dailylimit 6
Codes are refactored for further subclassing ideas.
"nrts/" folder is now re-organized with "nrts/basicsearcher" and "nrts/livedemo". Tested and worked out fine.
from anserini.
Great. I'll try to take a look tomorrow.
Thanks.
from anserini.
Closing ticket since @xeniaqian94 obviously got it to work as part of the TREC 2016 RTS YoGosling baseline - see Issue #80 for follow-up.
from anserini.
Related Issues (20)
- Incorporate jtreceval directly into our repo HOT 3
- Jank in HNSW and InvertedDense search: -threads and -parallelism
- Test failure → Build failure HOT 3
- Upgrade to Lucene 9.9 HOT 13
- Enable recursive graph bisection?
- Lucene 9.9: Benchmark HNSW improvements HOT 11
- Lucene 9.9: Benchmark sparse improvements HOT 1
- Counter-intuitive result: more RAM = slower indexing (standard inverted indexes) HOT 3
- Integrate jtreceval into Anserini HOT 2
- Add ability to download pre-built indexes HOT 3
- Unable to run BEIR (v1.0.0): SPLADE++ CoCondenser-EnsembleDistil regressions HOT 1
- Iterator Design Pattern concerns
- Chain of Responsibility Pattern concerns
- Strategy Design Pattern concerns
- Reproduce "End-to-End Retrieval with Learned Dense and Sparse Representations Using Lucene" with pre-built indexes HOT 1
- Basic rank fusion implementation in Anserini HOT 1
- SearchCollection -rf.qrels option HOT 1
- Errors with openai-ada2-int8 regressions: GCLocker errors HOT 4
- error
- Cache path change
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from anserini.