Code Monkey home page Code Monkey logo

conceptreplacer's Issues

How to contribute

The first example featured in the home page is currently Ada Lovelace, and right in the first sentence there's "Countess of Lovelace", which should have been switched to "Count of Lovelace".

First of all, it wasn't immediately clear where the code repository for this experiment was located, and second, after I came here I wasn't sure how to propose the change.

Should I edit men.json to change "singular": [ "king" ], into "singular": [ "king" , "count" ], then edit women.json to change "singular": [ "queen" ], into "singular": [ "queen" , "countess" ],? Or should I edit the royalty key instead? Should I create a new key, otherwise? And if so, what would be the appropriate nomenclature?

These questions make it hard for interested people to contribute and make the experiment less useful as a consequence. IMO some instructions in the README, or a CONTRIBUTING.md file (and a more prominent link in the website) would help.

/cc @slashme

<script> tags and content aren't always properly stripped

There seem to be cases where the content of the <script> tag is not properly removed.

Example, see this article: http://www.neutrality.wtf/api/api.php?localize=1&url=http%3A%2F%2Fmoney.cnn.com%2F2017%2F08%2F21%2Fnews%2Feconomy%2Fgirls-who-code-saujani%2Findex.html

There's a block of script content (without the tags) in the middle. If you look at the original article's source (view-source:http://money.cnn.com/2017/08/21/news/economy/girls-who-code-saujani/index.html) it's clear that the script is inside a <script></script> tag. It's possible the DOM parser is confused because of the string "" that is immediately before that text.

This requires some investigation; it might be DOMDocument and xpath itself.

Make PHPUnit pass

Part of the reason it fails right now is the inline styles that have changed. It shouldn't depend on those (we should resolve #3 ) but it should also pass consistently.

Add travis-ci for the repo with phpcs

There's already phpunit that can run (though they need to be adjusted, they are failing at the moment, due to being written for a slightly earlier version) so we need to add phpcs and connect things to Travis to get CI going.

Stop re-parsing html in the Replacer on every one of its functions

Since all methods are static, we constantly parse and output HTML for the next step. We should stop doing that. Parsing should happen once per dictionary-replacement, and the DomDocument object should be the one that's passed around.

There might be opportunity to consider whether the Replacer needs to work with static methods only, or if it just needs to be instantiated per replacement, pass around the document internally based on external calls. Or, we can keep its methods static but have all of them expect a DOMDocument instead of html, and add a one-time parsing method that runs ahead of all of them, then pass around the changed/saved DomDocument around.

Whatever we do, we need to consider that the Replacer should be flexible enough to also do future replacing methods where, perhaps, plain-text is passed in (it could be mocked into an HTML document, be parsed as if it's all one by paragraph, and then still be sent as DomDocument around, so we keep the wrapping of the output)

Only change content on <body>

Right now we're fetching all //text() nodes; it seems that also replaces content in <title> tags, which is not ideal.

Split 'parent' terms into 'familiar' and 'formal'

The replacement should take both into account; split to 'formal' - "mother" / "father" vs familiar - "dad" / "mom"

There may be other words to consider splitting as well; this should also be taken into account when redoing/redesigning the dictionaries in general for #7

Rearrange the categorization of the dictionary

The dictionary categorization was born in the hackathon, and it's really bad.

The two dictionaries must match in categories of words, but the categories should really be organized properly.

Strip inline scripts

The replacer removes <script> tags, but we should also safely remove inline scripts from nodes, if possible.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.