aigents / aigents-java Goto Github PK
View Code? Open in Web Editor NEWAigents Java Core Platform
License: MIT License
Aigents Java Core Platform
License: MIT License
In extension to #9
Make sure forgetting does not remove authored news without topics (by 'authors' link)
Options:
Real Problem:
Currently, the value of image supplied for news items with values of title, text, and sources (link) may be relevant to the text and title or not. This is because the image is located with ContentLocator based on logic found in Matcher:
https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/self/Matcher.java#L170
The logic expects proximity of the image to the located text in terms of raw HTML text, and not on spatial proximity in terms of visual appearance in HTML browser or semantic proximity from human point of view.
Need to search for a way to improve the current behavior, while we can't have HLAI exposed to virtual pages generated by a virtual browser and pretending the HLAI is seeing the texts and images the same way as humans do.
Possible Solutions:
Need: Make it possible for “public channel owners” to share news to those who “trust” them with no need of the explicit setting to “share” news to each of the “trusters” individually (based on presence of "shares" option)
Task: Make news item available for trusting peers in case if sharing peer either
a) is sharing it to trusters (like it is done now)
b) has shared areas in shares and areas list (TODO)
Goal
There is a need to refactor/extend existing HTML stripper to have textual and semantic information extraction more reliable than it is currently happening in legacy HtmlStripper https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/cat/HtmlStripper.java
Each of the following sub-tasks may be considered as a separate issue and respective project.
Sub-tasks
Sub-task details
1. There is a need to extract schema.org embeddings in any possible representation (JSON-LD/microdata/RDFa)
Many of modern web pages contain lots of semantic information not visible to the human eye of a web user, according to specification in https://schema.org/ - the parser may be capable to extract this information when loading the page and apply the monitoring/extraction policies to the explicit semantic graph data rather than plain text.
2. There is a need to extract structural information from the HTML markup
The existing HTML stripper blindly removes HTML tags, having some of them replaced with periods which makes it possible to do account for sentence and paragraph boundaries when doing the text pattern matching - in some cases. However, the use of HTML tags is site-specific and developer-specific, so this may not work in some cases. Fore more precise identification of the sentence boundaries, the hierarchical structure of a HTML document should be preserved in the stripped text, so the sentence/paragraph boundaries should be detected based on the structure of the hierarchical text and not on the presence of the tags.
3. There is a need to extract spatial HTML+CSS information from the loaded web page
In some cases, the above maybe not enough because the relevance of particular pieces of texts to the images, links and even each other may be based not on spatial relationships between them in the HTML text body and even not in the hierarchical structure of it, but rather on 2-dimensional spatial proximity, provided by HTML+CSS markup rendered by the browser (with the account to screen resolution and layout). That means, the Ideal Web Page Analyser would simulate the real web browser computing coordinates of pixels for every element and scarping the screen elements the same way as a human eye would do.
4. There is a need to extract DOM representation from web pages dynamically created by JavaScript/DHTML
All of the above may not work in case of WebPages generated by DHTML (such as https://aigents.com/ for instance), so there is a need to simulate the browser executing the complete suite of WWW technologies including CSS and JavaScript like it is done by Selenium WebDriver and WebKit - the simplex example of how it could be done is provided by https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/util/WebKiter.java
5. There is a need to extract semantic relationships from web pages, same as would be encoded with 1 (above) but using NLP and text mining techniques accompanied with 2, 3, 4 (above)
Since we can extract semantic relationships from the raw web page according to 1 (above), the entire process of Aigents web monitoring may be changed so the framework expects WepPage to be stripped down not to the plain text (like the HtmlStripper currently does), but rather to a subgraph of semantic relationship (like the Matcher is expected to do) - involving all of the techniques 2,3,4 (above). In such a case, we would end up with a design with semantic parsing of every web page and then subgraph monitoring and extraction applied to the page.
Aigents Chrome Plugin can be created for seamless integration with Web browsing and transparent reinforcement learning. Substantial part of existing Aigents client JavaScript code is expected to get re-used: https://github.com/aigents/aigents-web/tree/master/html/ui
Like in #25 , the user should have the ability to change the "home Aigents Server" destination in the plugin settings, so the same plugin can be used to access some public "Aigents Servers" as well as private "Aigents Servers" owned by user or user's company, for example.
The important possibility of such plugin is that user would be able to explicitly point to particular pieces of the content in the browser to the Aigetns while the Aigents would be able to do the same for the user so the efficiency of Aigents' help to user as well as user's ability to train Aigents would get enormously increased.
The following Python code (based on https://github.com/singnet/reputation) is showing the individual ratings by period as well as predictiveness values used for blending are being rounded up to 1.0, which have to be fixed eventually:
import unittest
from datetime import datetime, date
import time
import logging
import pandas as pd
import numpy as np
from reputation_service_api import *
from reputation_calculation import *
from reputation_base_api import *
from aigents_reputation_api import AigentsAPIReputationService
rs = AigentsAPIReputationService('http://localtest.com:1180/', '[email protected]', 'q', 'a', False, 'test', True)
#rs = PythonReputationService() ###Change
rs.clear_ranks()
rs.clear_ratings()
dt1 = date(2018, 1, 1)
dt2 = date(2018, 1, 2)
dt3 = date(2018, 1, 3)
dt4 = date(2018, 1, 4)
rs.set_parameters({'default':0.5,'decayed':0.5,'conservatism':0.25,
'fullnorm':False,'logratings':False,'liquid':True,'rating_bias':False,'predictiveness':1,
'aggregation':True})
rs.put_ratings([{'from':'1','type':'rating','to':'4','value':0.5,'weight':10,'time':dt1}])
rs.put_ratings([{'from':'2','type':'rating','to':'5','value':1.0,'weight':10,'time':dt1}])
rs.put_ratings([{'from':'3','type':'rating','to':'6','value':0,'weight':10,'time':dt1}])
rs.put_ratings([{'from':'2','type':'rating','to':'5','value':1.0,'weight':10,'time':dt1}])
rs.update_ranks(dt1)
ranks = rs.get_ranks_dict({'date':dt1})
ranks#,{'4': 90.0, '5': 100.0, '6': 14.0})
rs.put_ratings([{'from':1,'type':'rating','to':'5','value':0.75,'weight':10,'time':dt2}])
rs.put_ratings([{'from':2,'type':'rating','to':'6','value':0.25,'weight':10,'time':dt2}])
rs.put_ratings([{'from':3,'type':'rating','to':'4','value':0.75,'weight':10,'time':dt2}])
rs.update_ranks(dt2)
ranks = rs.get_ranks_dict({'date':dt2})
print("my ranks:",ranks)
The existing Aigents Web Demo User Interface present in https://github.com/aigents/aigents-web and available https://aigents.com/ is 3 years old and does not seem attractive enough for many. We consider changing it which would take 2-3 person-months project, presumably keeping use of jQuery used by current Web UI so the most of JavaScript code may get re-used: https://github.com/aigents/aigents-web/tree/master/html/ui
Like #25 for Android, Lite Client app can be created for iPhone - either re-using existing JavaScript https://github.com/aigents/aigents-web/tree/master/html/ui or porting it to native Objective C
Overall task and design:
Based on #22, we need to provide an extended version of the Question Answering to replace or texted the current placeholder:
https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/peer/Answerer.java
The code may go to org.aigents.nlp.qa or to respective package of the Aigents Platform Core.
There are few things to be done, written in the following pseudo-code to be refined during the implementation phase:
interface Indexer {
void clear();//clears the current index
void index(String text);//indexes text in the internal model where the model can be any
Linker retrieve(String query);//retrieve the ranked list of relevant words based on the single query applied to the scope of all texts indexed by date, see https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/data/Linker.java
}
//Candidate implementation of the Indexer relying on the existing code
class GraphIndexer implements Indexer {
Graph graph;//see https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/data/Graph.java
int Mskip = 2;//width of skipping window to build word pairs
// will be used to index any number of input texts in a graph object
@Override
index(String text){
// tokenise text with Parser.parse https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/data/Miner.java#L580
// build word-word links based on per-sentence word pairs co-occurring in a distance of Mskip using link types "pred" and "succ" and store them in a graph with link weight set as W = Mskip / distance (so the closer words are given larger weight, the closest word weighted as Mskip and the most distant word weighted as 1)
}
@Override
Linker retrieve(String query){
// tokenize query with Parser.parse https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/data/Miner.java#L580
// compute the ranks of nodes in the graph using algorithm GraphOrder.directed https://github.com/aigents/aigents-java/blob/master/html/ui/aigents-graph.js#L537 (need to add this function to Graph class) initialized with word nodes found in the query, with every word node weight to be 1 denominated with word frequency from https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/data/LangPack.java#L85.
// retrieve the computed ranks of words from Graph and return in Linker implementation such as https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/data/Counter.java or https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/data/Summator.java having it returned
}
}
class AnswerGenerator extends Answerer { //to be re-used in https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/peer/Answerer.java
Indexer indexer; //see above
Generator generator; //see above
in max words;//configured hard cap limit on number of words to be used to build the reply
String answer(String query){
Linker words = indexer.retrieve(query);
if (words == null || words.size() ==0)
return "No.";
Collection<String> top = getTopWordsFromLinker(linker);
String response = generator.generate(top); //see #22
return response;
}
}
Task outline:
public static String summarize(java.util.Set words, String text)
function in https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/peer/Answerer.java#L163Collection searchSTMwords(Session session, final SearchContext sc)
function in https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/peer/Answerer.java#L82References:
https://blog.singularitynet.io/an-understandable-language-processing-3848f7560271
What is done so far:
What will come next:
3. Siter will be split in Siter and WebCrawler.
4. Siter will hold overall crawling framework and be configurable at Body level so one can create/extend/override it
5. WebCrawler will do actual web crawling and implement Crawler interface (this interface will be also implemented by Redditer, Twitterer and Discourser), so you could extend override the WebCrawler itself or add custom Crawlers
6. The current readChannel method of Redditer, Twitterer and Discourser will be moved to Crawled interface renamed to "crawl"
7. RSSer will be created and implementing Crawled interface as an example of custom crawler (do can do Arxiv and PsyArxiv plugins)
Currently, Aigents extracts pattern-based "news items" on per-topic (is attribute) and per-url (sources attribute) basis for a specific day (times attribute), which are represented as short excepts.
Task: Create aggregated content generation based on the "news items" found above
Level 1: Simple aggregation: can be defined as a user-specific property on how to cook the news for every specific user - using "news aggregation" property with 5 values (none, summary, overview, digest, history)
Level 2: Complex formation: In addition to the above, combinations of the topics corresponding to each other and clusters of related topics can be used together with LinkGrammar-based formal grammar (and possibly some underlying ontology) to generate literary content generation describing novel (salient and "surprising") combinations of topics - based on progress with #22 .
As a simplified version of #24, we may have lightweight Aigents client with native Android user interface exposing the functions present in https://github.com/aigents/aigents-web and exposed at https://aigents.com/
The lightweight client may be optionally done in JavaScript instead of the Java.
In this case, the client data will be stored in the cloud (as opposed to #24 where the data is stored on the mobile device), but the user should have ability to change the "home Aigents Server" destination in the application settings, so the same client application can be used to access some public "Aigents Servers" as well as private "Aigents Servers" owned by user or user's company, for example.
Need to provide integration to Ethereum, so the payments can be conducted in ETH and accuonted by billing - the same way it is done to PayPal
https://github.com/aigents/aigents-java/tree/master/src/main/java/net/webstructor/comm/paypal
https://github.com/aigents/aigents-java/blob/master/html/ui/aigents-wui.js#L1055
May be integrated with existing Infura-based Ethereum logging and analysis support
https://github.com/aigents/aigents-java/tree/master/src/main/java/net/webstructor/comm/eth
Need to provide integration to GooglePay - the same way it is done to PayPal
https://github.com/aigents/aigents-java/tree/master/src/main/java/net/webstructor/comm/paypal
https://github.com/aigents/aigents-java/blob/master/html/ui/aigents-wui.js#L1055
Overview:
In the end, ideally, we want the natural language text to be produced in a quality higher than provided by modern conversational intelligence chatbots (such as https://replika.ai/ ) however we want the AI to be "explainable" ("interpretable"), like presented in https://blog.singularitynet.io/an-understandable-language-processing-3848f7560271
The language production should be based on underlying ontology plus formal grammar, even though we may use ML/DL to create these underlying ontology and formal grammar and we may use NN (such as graph networks) to operate with these underlying ontology and formal grammar. It is intended to serve an extended solution for tasks #34 and #21.
Goals:
Anyhow, as part of the whole NLP pipeline, we should be able, given a finite list of words (or semantic concepts associated with these words) combined with a formal grammar for a natural language (such as English or Russian), produce a grammatically valid sentence or series of sentences - that is the scope of this particular task.
Tentative TODO items:
<post-nominal-u>
"); - DONEReferences:
https://blog.singularitynet.io/an-understandable-language-processing-3848f7560271
http://aigents.com/papers/2019/ExplainableLanguageProcessing2019.pdf
https://www.youtube.com/watch?v=ABvopAfc3jY
https://www.youtube.com/watch?v=cwgtcOfA3KI
https://arxiv.org/abs/1401.3372
https://arxiv.org/abs/2005.09280
http://langlearn.singularitynet.io/data/docs/
In case if Link-Grammar (LG) is chosen:
On Natural Language Generation with Link Grammar:
https://books.google.ru/books?id=HwW6BQAAQBAJ&pg=PA459&lpg=PA459&dq=link+grammar+language+generation&source=bl&ots=Lnj2CmORKC&sig=ACfU3U3QjcHw-ruEN0hh95hVZ32Mu78yfg&hl=ru&sa=X&ved=2ahUKEwj628PW57zqAhX1wsQBHTIcB7AQ6AEwBHoECAkQAQ#v=onepage&q=link%20grammar%20language%20generation&f=false
https://wiki.opencog.org/w/Natural_language_generation
http://www.frontiersinai.com/turingfiles/December/lian.pdf
On SAT-solver and Grammars:
https://www.hf.uio.no/iln/om/organisasjon/tekstlab/aktuelt/arrangementer/2015/nodalida15_submission_91.pdf
https://books.google.ru/books?id=xBJVDQAAQBAJ&pg=PA67&lpg=PA67&dq=sat+solver+grammar&source=bl&ots=IOSARwDh2b&sig=ACfU3U0IooczXG8sDnK5K2yr9jmY0pRHzQ&hl=ru&sa=X&ved=2ahUKEwjW5IfwlqHqAhUNEJoKHVg1AzQQ6AEwAnoECAUQAQ#v=onepage&q=sat%20solver%20grammar&f=false
https://www.semanticscholar.org/paper/Analyzing-Context-Free-Grammars-Using-an-SAT-Solver-Axelsson-Heljanko/0fd33fd35fc8a8b32287d906cf6d3576d0a294b2
https://books.google.ru/books?id=-jVxBAAAQBAJ&pg=PA35&lpg=PA35&dq=language+generation+sat+solver&source=bl&ots=V1hzzi1xJA&sig=ACfU3U3CL00HJVknvEUADMWvucLkvefMEw&hl=ru&sa=X&ved=2ahUKEwi3_dbll6HqAhWswqYKHY-mB-sQ6AEwDHoECAwQAQ#v=onepage&q=language%20generation%20sat%20solver&f=false
Subtasks:
References:
For content restrictions (5, 6), can use the following datasets and corpora:
a) Obscene lexicon for Russian https://github.com/odaykhovskaya/obscene_words_ru/blob/master/obscene_corpus.txt
b) Bad words in English https://www.freewebheaders.com/full-list-of-bad-words-banned-by-google/
In order to better understand boundaries of the matching text spots, both HTML and PDF (and DOC, ODT, etc. in the future) rich texts should be stripped not to text (like HtmlStripper.convert does now), but to intermediate hierarchical representation preserving both structure of text organization and links, images and titles (kind of internal unified DOM representation).
Actions:
Note:
Current HtmlStripper.convert inserts periods "." in the places of structured HTML tags but this is not done for PDF. Now it is the time to do this consistently for any rich text source, not breaking the other working parts.
Based on warnings seen in https://validator.w3.org/feed/
line 13, column 4: Missing enclosure attribute: length (7 occurrences) [help]
<enclosure url="https://www.youtube.com/yts/img/pixel-vfl3z5WfW.gif" typ ...
line 13, column 4: type attribute of enclosure must be a valid MIME type (7 occurrences) [help]
<enclosure url="https://www.youtube.com/yts/img/pixel-vfl3z5WfW.gif" typ ...
Need to EITHER
A) identify size and mime type of the enclosed image properly (see https://stackoverflow.com/questions/705224/how-do-i-add-an-image-to-an-item-in-rss-2-0 saying "The length attribute doesn't need to be completely accurate but it's required for the RSS to be considered valid")
OR
B) include image as <img ... /> tag in the having the content of the description framed into ![CDATA[...]], like ![CDATA[<img src="https://my.site.com/my_image.jpg"]]My text (see https://www.aitrends.com/feed/ for example)
line 17, column 2: item should contain a guid element (13 occurrences) [help]
Need to have every news item to have a guid (better being a permlink) associated with a feed, like
https://www.aitrends.com/?p=18403
line 132, column 0: Missing atom:link with rel="self" [help]
Need to have RSS feed url to be part of the feed channel, like:
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
>
...
<atom:link href="https://www.aitrends.com/feed/" rel="self" type="application/rss+xml" />
See https://www.aitrends.com/feed/ for example.
The Problem:
The PathFinder/PathTracker components responsible for building the "path" navigation across web links from page to page starting from the "root site URL" (rootPath) have two issues:
Fri Jun 05 13:47:30 UTC 2020:Site crawling failed unknown https://blog.wechat.com/category/news/ java.lang.ArrayIndexOutOfBoundsException: 0,:0
java.lang.ArrayIndexOutOfBoundsException: 0
at net.webstructor.al.Set.get(Set.java:35)
at net.webstructor.self.PathTracker.run(PathTracker.java:136)
at net.webstructor.self.PathTracker.run(PathTracker.java:110)
at net.webstructor.self.PathTracker.run(PathTracker.java:96)
at net.webstructor.self.PathTracker.run(PathTracker.java:58)
at net.webstructor.self.WebCrawler.crawl(WebCrawler.java:66)
at net.webstructor.self.Siter.read(Siter.java:171)
at net.webstructor.self.Spider$1.call(Spider.java:191)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
We need to solve both.
Extra:
In addition to that, for each of the "sites" configured for crawling, we may have the option "crawl mode" (SMART|FIND|TRACK) set other than default "SMART" so the "path" can not be modified and always re-used as configured manually ("TRACK" mode) or never used so the exhaustive crawl applies every time ("FIND" mode).
DONE:
TODO:
There is an existing old Aigents Desktop App in Java based on the Aigents Core https://github.com/aigents/aigents-java/tree/master/src/main/java/net/webstructor/gui
which is using java.awt framework and can work under Linux, Mac OSX and Windows
However, its functionality is pretty much outdated and does not contain many latest features present in Aigents Web User interface https://github.com/aigents/aigents-web available at https://aigents.com/
Also, it makes sense to have the Aigents Desktop App to be based on java.fx framework instead of java.awt with built-in Web browser, like it is done in the existing Android App https://github.com/aigents/aigents-android so more tight integration between Aigetns Core functionality and Web browsing operations can be achieved, like in case of Chrome browser plugin per #27
Example:
https://www.joom.com/robots.txt
...
Disallow: */q.*
...
not matched for
https://www.joom.com/ru/search/q.xiaomi
in
https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/cat/HttpFileReader.java#L211
and file is tried to get read with error 400
Wanted
Have news items supplied with "title" property, in addition to currently existing "text", "sources", "times" and "image".
One way to solve this is do the same trick as it is done with images and links - provide another container to the html stripper so it collects all tags that you have identified and keeps them with indexes to the original positions and then when the text is matched it can lookup back for the closest title candidate.
Here is where the image indexing happens:
https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/cat/HtmlStripper.java#L202
Here where is is used:
I guess one can just re-use the Imager class for the purpose. Then one just needs two hacks nearby the points that I have indicated:
Index all "title", "h1", "h2", "h3" tags plus may be some other collecting their interiors in the collector structure same as called to collect the image urls.
https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/cat/HtmlStripper.java#L202
When the news item is created, lookup the closest indexed title candidate occurring before, like it is done when attaching image urls:
Put the found title candidate into the "title" property of the news item
4. Optionally: If no title candidate found, we MAY don't leave no title or create blank "title", but may use alternative strategy like using the most salient/interesting words in a text placed in the title in the same order as they appear in the text)
Need to provide sentiment analysis support for English, Russian and Chinese
Sources:
English:
Russian:
DONE:
TODO:
P.S.:
File merging tips: https://stackoverflow.com/questions/4366533/how-to-remove-the-lines-which-appear-on-file-b-from-another-file-a
At the moment, lexicons are stored in the root:
lexicon_english.txt
lexicon_negative_english.txt
lexicon_negative_russian.txt
lexicon_positive_english.txt
lexicon_positive_russian.txt
lexicon_rude_english.txt
lexicon_rude_russian.txt
lexicon_russian.txt
We want to change it, along with adding support for cognitive distortions, to be like this:
data dict en lexicon.txt negative.txt positive.txt rude.txt mentalfiltering.txt magnification.txt ... ru lexicon.txt negative.txt positive.txt rude.txt zh
For the high-performance and high-capacity Aigents Servers supporting thousands and millions of users, we would need to change the existing storage design of the Aigents.
Currently, it involves:
A) in-memory custom graph DB https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/core/Storager.java (stored in al.txt snapshots)
B) "temporal graphs" for indexing source-specific historical graph data https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/data/GraphCacher.java
C) "long-term memory" storage of the object instances https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/core/Archiver.java
D) cache of the web data https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/self/Cacher.java
While the above work fine for single-user Aigents instances and up to few-hundred-user instances, it may not scale well if we get thousands and millions of concurrent Line Clients (such as per #25 , #26 #27 and #28 ) so the following would have to get done:
At the current time to topics are generated along with patterns created as topi names via the TextMiner class
https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/data/TextMiner.java
and its underlying clustering implementation
https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/data/Miner.java
The patterns created this way are just disjunctive sets of words, missing the few things
It should be improved with more complex pattern formation involving symbolic pattern regression producing hierarchical patterns like discussed here:
https://www.youtube.com/watch?v=FzKMtNILmDk
The complete and self-contained Aigents application with server capabilities, built-in privacy protection and peer-to-peer capabilities already exists in Java:
https://github.com/aigents/aigents-android
with latest build scrips in Graddle:
https://github.com/aigents/aigents-android-graddle
The functionality is pretty outdated and does not include all the latest features present in the Aigents Core.
We are looking forward to have the new version created with all bells and whistles from the Aigents Core exposed to Android User interface
Need to provide RSS channels support like it is done for Reddit subreddits and user activity logs and will be done for Twitter (#4 )
For entry point, you will need new class RSSeer - see:
https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/self/Siter.java#L295
https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/comm/reddit/Reddit.java#L99
https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/comm/reddit/Reddit.java#L169
For file reading and content type checking - look up
https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/self/Cacher.java#L118
lines 118-123
A) reader.allowedForRobots(path) and if allowed
B) Use reader.canReadDocContext(path,context) or reader.readDocData(path," ",context) or something like that to
Support both:
https://sawv.org/2019/11/12/rss-vs-atom-vs-json-feed-vs-hfeed-vs-whatever.html
https://www.saksoft.com/rss-vs-atom/
https://problogger.com/rss-vs-atom-whats-the-big-deal/
RSS Feed Example:
https://www.feedforall.com/sample.xml
<?xml version="1.0" encoding="windows-1252"?>
--
| <rss version="2.0">
| <channel>
| <title>FeedForAll Sample Feed</title>
| <description>RSS is a fascinating technology. The uses for RSS are expanding daily. Take a closer look at how various industries are using the benefits of RSS in their businesses.</description>
| <link>http://www.feedforall.com/industry-solutions.htm</link>
| <category domain="www.dmoz.com">Computers/Software/Internet/Site Management/Content Management</category>
| <copyright>Copyright 2004 NotePage, Inc.</copyright>
| <docs>http://blogs.law.harvard.edu/tech/rss</docs>
| <language>en-us</language>
| <lastBuildDate>Tue, 19 Oct 2004 13:39:14 -0400</lastBuildDate>
| <managingEditor>[email protected]</managingEditor>
| <pubDate>Tue, 19 Oct 2004 13:38:55 -0400</pubDate>
| <webMaster>[email protected]</webMaster>
| <generator>FeedForAll Beta1 (0.0.1.8)</generator>
| <image>
| <url>http://www.feedforall.com/ffalogo48x48.gif</url>
| <title>FeedForAll Sample Feed</title>
| <link>http://www.feedforall.com/industry-solutions.htm</link>
| <description>FeedForAll Sample Feed</description>
| <width>48</width>
| <height>48</height>
| </image>
| <item>
Atom Feed Example:
https://validator.w3.org/feed/docs/atom.html
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Example Feed</title>
<link href="http://example.org/"/>
<updated>2003-12-13T18:30:02Z</updated>
<author>
<name>John Doe</name>
</author>
<id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</id>
<entry>
<title>Atom-Powered Robots Run Amok</title>
<link href="http://example.org/2003/12/13/atom03"/>
<id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
<updated>2003-12-13T18:30:02Z</updated>
<summary>Some text.</summary>
</entry>
</feed>
Use XML:
https://www.viralpatel.net/java-xml-xpath-tutorial-parse-xml/
RSS test feeds:
http://feeds.reuters.com/reuters/businessNews
http://feeds.reuters.com/reuters/technologyNews
http://feeds.reuters.com/reuters/politicsNews
http://feeds.reuters.com/news/wealth
https://blog.feedspot.com/bitcoin_rss_feeds/
https://blog.feedspot.com/reuters_rss_feeds/
https://gist.github.com/hamzamu/5c2fa2907ec507f4aba3ba6fcce2d21b
Like #24 for Android, complete Self-Server with peer-to-peer capabilities and secure storage app can be created for iPhone - porting original Aigents Java code https://github.com/aigents/aigents-java/tree/master/src/main/java/net/webstructor to native Objective C.
Expected to be minimally half human-year project.
As it has been suggested by Ibby Benali:
I think if the bot would respond with something like this when you do /start :
Hi! I am the SingularityNET Aigents Reputation Bot. I calculate xyz for you. I can provide you personal reputation reports. In order to start, please tell me your name.
next message
Thanks! Nice to meet you name. Can you please tell me your email, I need that for xxx.
Next message.
Awesome! Now let’s look at your reputation. If you would like to get a reputation report for yourself, please type /reputation @your_username. Let’s try it out!
Provide report. Next message
Isn’t that cool? If you would like to use me in groups, just add me to your chatgroup. If you would like to know the reputation of a user, just reply to their message with /reputation.
For now, it is great to meet you. You can follow my progress here and here. If you would like to opt-in for updates to my software, just type /updates and I will ping you when I learned a new trick.
the above is just an idea but maybe it will guide the conversation and interaction a bit more smoothly with the above
and perhaps as a fallback:
Uh oh, I am not sure what you mean. Please type /help to see what I can do, or let’s pick up where we left off: (insert the thing where you left off.. e.g. “I wanted to know your email for xxx”)
TODO:
Need to have EOS and CyberWay/Golos.io integrated similarly to Ethereum, Steemit and Golos.id
Resources:
https://developers.eos.io/manuals/eosjs/latest/index
Need separate Socializer-derived plugin(s) for Arxiv and PsyArchiv PDF parsing
Arxiv - follow one of the options:
1.1. Use Arxiv search API with results returned in Atom Feed format, see: https://arxiv.org/help/api#using , https://arxiv.org/help/api/user-manual, https://arxiv.org/help/api/user-manual#query_details and http://export.arxiv.org/api/query?search_query=all:agi
1.1.1. In custom version, can use query parameters "start" and "max_results" to iterate over the full document collections "search_query=anton kolonin&id_list=&start=0&max_results=10" (can be also done as a hack in RSSer translation URLs containing "arxiv.org" into API calls like "http://export.arxiv.org/api/query?search_query=agi&start=2&max_results=2")
1.1.2. In custom version extra fields of the feed can be used, see https://arxiv.org/help/api/user-manual#query_details
1.2. Implement custom crawling with custom crawler plugin (like RSS) on Aigents side, based on #5
1.3. Implement Aigents-side URL filtering logic per site/user/instance for A) URLs not crawled and B) URLs not used to create news items
PsyArchiv:
2.1. TODO
Random Issues:
3.1. pdfs not read from site in agi channel
3.2. https://arxiv.org/list/cs.AI/recent
3.3. enable scope=web as default ?
3.4. missed https://arxiv.org/list/cs.AI/recent for 'knowledge representation'
P.S.: Suggestions from Eyob:
Base on the progress with issue #22 , we want to use the formal grammar to identify boundaries of sentences in the token (word) streams in two cases:
The solution would have at least two applications:
A) Split the stream of tokens/words into sentences for further linguistic processing such as parsing and entity extraction
B) Split the stream of tokens/words into sentences for selecting the "featured" sentences containing some "hot" keywords for summarization purposes.
Initial progress has been reached with
https://github.com/aigents/aigents-java-nlp/blob/master/src/main/java/org/aigents/nlp/gen/Segment.java
in
aigents/aigents-java-nlp#11
Still, there is more work to do to improve the accuracy.
For testing purposes, we can use (for example) the SingularityNET extract from Gutenberg Children corpus used in Unsupervised Language Learning project, using the "cleaned" corpus: http://langlearn.singularitynet.io/data/cleaned/English/Gutenberg-Children-Books/capital/
then creating "extra-cleaned" corpus removing all sentences with quotes and brackets like [ ] ( ) { } ' " and all sentences with inner periods like "CHAPTER I. A NEW DEPARTURE"
then gluing sentences together on per-file or per-chapter basis and evaluate the accuracy based on the number of correctly identified sentence boundaries.
Any alternative corpora for testing against any baseline results achieved by any other authors may be considered as well.
References:
https://www.researchgate.net/publication/321227216_Text_Segmentation_Techniques_A_Critical_Review
https://www.google.com/search?q=natural+language+segmentation%20papers
Make it possible for single Aigents installation configure one Discourse site (like https://community.singularitynet.io), so the following should be possible:
Data model:
category<-topic<-post(ordered by local numbers within topics)
All categories:
https://community.singularitynet.io/categories.json
Topics in category:
https://community.singularitynet.io/c/66.json
Get topics:
https://community.singularitynet.io/latest.json
Get topic with post stream:
https://community.singularitynet.io/t/2753.json
Get user:
https://community.singularitynet.io/users/akolonin.json
Get users:
https://community.singularitynet.io/admin/users/list/active.json
{"errors":["The requested URL or resource could not be found."],"error_type":"not_found"}
Get posts:
https://community.singularitynet.io/posts.json
https://community.singularitynet.io/posts.json?before=8098
Get post:
https://community.singularitynet.io/posts/8099.json
Get likes:
https://meta.discourse.org/t/getting-who-liked-a-post-from-the-api/103618/3
curl 'https://community.singularitynet.io/post_action_users?id=8098&post_action_type_id=2' -H 'Accept: application/json'
{"post_action_users":[{"id":118,"username":"Patrik_Gudev","name":null,"avatar_template":"/user_avatar/community.singularitynet.io/patrik_gudev/{size}/430_2.png","post_url":null,"username_lower":"patrik_gudev"},{"id":24,"username":"akolonin","name":null,"avatar_template":"/user_avatar/community.singularitynet.io/akolonin/{size}/146_2.png","post_url":null,"username_lower":"akolonin"}]}
Get user actions:
curl https://community.singularitynet.io/user_actions.json?username=akolonin
https://github.com/discourse/discourse_api/blob/master/lib/discourse_api/api/user_actions.rb
action_type:
1 - liked by me
2 - liked by other
3 - unknown TODO?
4 - my topic posts
5 - my reply posts
6 - reply posts on my reply posts (except reply posts on my topic post) TODO?
7 - mentions of me
https://www.discourse.org/plugins/oauth.html
https://meta.discourse.org/t/official-single-sign-on-for-discourse-sso/13045
https://meta.discourse.org/t/using-discourse-as-a-sso-provider/32974
https://www.jokecamp.com/blog/examples-of-creating-base64-hashes-using-hmac-sha256-in-different-languages/#java
Here are some resources to check out for the Discourse API + some extras:
https://docs.discourse.org/ for the API
https://meta.discourse.org/t/data-explorer-plugin/32566 an official discourse plugin that allows for live database queries
https://meta.discourse.org/t/discourse-voting/40121 voting functionalities
On badges and communities:
https://meta.discourse.org/t/what-are-badges/32540
https://meta.discourse.org/t/how-to-grant-a-custom-badge-through-the-api/103270 (e.g. a badge for creating an aigents feed with x readers perhaps?)
Our own badges (using the standard ones: https://community.singularitynet.io/badges
https://blog.discourse.org/2018/06/understanding-discourse-trust-levels/
https://meta.discourse.org/t/discourse-moderation-guide/63116
https://blog.discourse.org/2014/08/building-a-discourse-community/
On discobot (although our integration needs a bit more work to make it autogenerated text):
https://meta.discourse.org/t/how-to-customize-discobot/103633
https://blog.discourse.org/2017/08/who-is-discobot/
Zapier + Discourse:
https://zapier.com/apps/discourse/integrations
Need to provide Twitter support, including
See:
1)
https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/comm/reddit/Reddit.java#L80
https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/comm/reddit/Redditer.java#L61
2)
https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/comm/reddit/Reddit.java#L74
https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/comm/reddit/Reddit.java#L185
https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/comm/reddit/RedditFeeder.java#L116
3)
https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/comm/reddit/Reddit.java#L99
Pay Attention:
"If you need to share Twitter content you obtained via the Twitter APIs with another party, the best way to do so is by sharing Tweet IDs, Direct Message IDs, and/or User IDs, which the end user of the content can then rehydrate (i.e. request the full Tweet, user, or Direct Message content) using the Twitter APIs. This helps ensure that end users of Twitter content always get the most current information directly from us.
We permit limited redistribution of hydrated Twitter content via non-automated means. If you choose to share hydrated Twitter content with another party in this way, you may only share up to 50,000 hydrated public Tweet Objects and/or User Objects per recipient, per day, and should not make this data publicly available (for example, as an attachment to a blog post or in a public Github repository)."
Source: https://developer.twitter.com/en/developer-terms/more-on-restricted-use-cases
Task:
Primary options:
Secondary options:
Antons-MacBook-Pro:aigents-java akolonin$ git commit -m "update version and year 2024"
[master feb5d93] update version and year 2024
2 files changed, 4 insertions(+), 4 deletions(-)
Antons-MacBook-Pro:aigents-java akolonin$ git push
Counting objects: 12, done.
Delta compression using up to 12 threads.
Compressing objects: 100% (7/7), done.
Writing objects: 100% (12/12), 813 bytes | 813.00 KiB/s, done.
Total 12 (delta 6), reused 0 (delta 0)
remote: Resolving deltas: 100% (6/6), completed with 6 local objects.
remote:
remote: GitHub found 6 vulnerabilities on aigents/aigents-java's default branch (6 moderate). To find out more, visit:
remote: https://github.com/aigents/aigents-java/security/dependabot
remote:
To https://github.com/aigents/aigents-java.git
bd4b073..feb5d93 master -> master
We want to be able to do parsing of any language supported by LinkGrammar, starting with English, to be available both internally in Aigents framework and via Aigents Language API.
Specs:
Use existing LinkGrammar in Java implementation https://arxiv.org/pdf/2105.00830.pdf
Subtasks:
Extension for segmentation and punctuation - subtasks:
5. Segmentation by sentence - 4 weeks
6. Adding punctuation - 4 weeks
7. Russian dictionary load - 2 weeks (need only for Russian)
8. Assemble with the account to morphology - 2 weeks (need only for Russian)
Requirement:
Subtasks:
We want to have the integration with Cosmos API and ecosystem over the REST API in the same way that we already have for Steemit, Golos and Ethereum.
REST API spec:
https://docs.cosmos.network/
https://cosmos.network/rpc/v0.37.9
Discourse forum:
https://forum.cosmos.network/t/integrating-aigents-with-cosmos-tendermint-and-this-discourse-forum/4233
Task: We need to have Aigents Graphs rendering framework
https://blog.singularitynet.io/graphs-part-3-aigents-graph-analysis-for-blockchains-and-social-networks-142fc8182389
present for Aigents Web version
https://github.com/aigents/aigents-java/blob/master/html/ui/aigents-graph.js
https://github.com/aigents/aigents-java/blob/master/html/ui/aigents-gui.js
https://github.com/aigents/aigents-java/blob/master/html/ui/aigents-map.js
ported to the Aigents Desktop and Server version
https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/gui/App.java
having some UI/UX design decisions discussed and updated along the way.
We may re-use UI/graph rendering Java code from Webstructor project
http://webstructor.net/ (which will have to be open-sourced along the way)
Reason: There is a capping limit on the number of transactions returned by the server to the web client (because web client just hangs rendering more than a few thousand transactions).
Design: It may be implemented as
A) Server library serving huge graphs rendering to any canvas (based on https://github.com/aigents/aigents-java/blob/master/html/ui/aigents-graph.js and code from http://webstructor.net/)
B) Desktop GUI presenting canvas to the graph renderer and user interaction combining both graph rendering/interaction paradigms currently present in Aigents Web client (https://github.com/aigents/aigents-java/blob/master/html/ui/aigents-gui.js and https://github.com/aigents/aigents-java/blob/master/html/ui/aigents-map.js)
C) Client-server protocol enabling to render huge graphs in png/svg file and rendering them on wen client as png/svg images. PNG/SVG should be likely a configurable option (because SVG would not work for huge graphs over the web anyway).
Option: Support for 3-dimensional graphs may be implemented based on http://webstructor.net/ code.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.