Code Monkey home page Code Monkey logo

probe-engine's Introduction

OONI Probe Engine

Semi-automatically exported from github.com/ooni/probe-cli.

Check UPSTREAM to see the tag/commit from which we exported.

This is a best effort attempt to export probe-cli internals to community members.

We will ignore opened issues and PRs on this repository. You should use github.com/ooni/probe and github.com/ooni/probe-cli respectively.

probe-engine's People

Contributors

62w71st avatar bassosimone avatar cyberta avatar d1vyank avatar hellais avatar ja-pa avatar kelmenhorst avatar lorenzoprimi avatar xhdix avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

probe-engine's Issues

Support a custom DNS resolver

A custom DNS resolver allows to gather more low-level information. Some OONI nettests require this functionality. At the same time there is the issue that an external resolver needs to know about what addresses to use. This has been historically non straightforward to do in mobile platforms (or we didn't invest enough time in doing that; pick one).

For a Go codebase https://github.com/miekg/dns seems a good DNS library to use.

I'm copying here relevant MK tickets about DNS issues that are useful to keep in mind:

Reduce API coupling with probe-cli

I would like the make sure that the API surface shared with probe-engine is minimal. I want this to happen to make sure we can make changes independently of it. The end goal is to just have a small set of non-internal API exposed by probe-engine.

Currently, this is what probe-cli uses:

internal/enginex/enginex.go:    "github.com/ooni/probe-engine/model"
nettests/im/facebook_messenger.go:      "github.com/ooni/probe-engine/experiment/fbmessenger"
nettests/im/telegram.go:        "github.com/ooni/probe-engine/experiment/telegram"
nettests/im/whatsapp.go:        "github.com/ooni/probe-engine/experiment/whatsapp"
nettests/middlebox/http_header_field_manipulation.go:   "github.com/ooni/probe-engine/experiment/hhfm"
nettests/middlebox/http_invalid_request_line.go:        "github.com/ooni/probe-engine/experiment/hirl"
nettests/nettests.go:   "github.com/ooni/probe-engine/experiment"
nettests/nettests.go:   "github.com/ooni/probe-engine/experiment/handler"
nettests/nettests.go:   "github.com/ooni/probe-engine/model"
nettests/performance/dash.go:   "github.com/ooni/probe-engine/experiment/dash"
nettests/performance/ndt.go:    "github.com/ooni/probe-engine/experiment/ndt"
nettests/websites/web_connectivity.go:  "github.com/ooni/probe-engine/experiment/web_connectivity"
nettests/websites/web_connectivity.go:  "github.com/ooni/probe-engine/orchestra/testlists"
ooni.go:        "github.com/ooni/probe-engine/session"

Judging from this, I think we can reduce to session, experiment, and model.

Also, we can see to reduce the amount of each package that is exposed.

Use mobile user-agents as well

net.userAgents has nothing but ancient Firefox.

We should have greater diversity of the User-Agents as we've seen mobile-targeted injections.

TBD: find some source of up-to-date UA statistics.

Implement new OONI Probe nettests

This epic issue tracks what it remains to do to land the work done recently to write new experiments, or to rewrite existing experiments, using a better measurement engine:

  • Update to the latest ndt7: #70

  • DNS over TLS: #66

  • DNS over HTTPS: #65

  • SNI blocking detection: #64

  • DASH rewrite that uses HTTPS: #51

  • Telegram rewrite using github.com/ooni/netx: #54

When the above issues have been addressed, we can close this issue.

Psiphon in probe-engine

The Psiphon test is fully integrated inside of probe-engine making it ready for deployment.

The steps required are:

  • httpx implement NewClientWithProxy (ooni/netx#130)
  • HTTPDoConfig: add support for proxy function (ooni/netx#131)
  • porcelain: add proxy and byte counting support (ooni/netx#137)
  • build.sh: re-enable psiphon (ooni/probe-cli#79)
  • x/logger: mimic cURL logging (ooni/netx#137)
  • psiphon: use netx so we fill the HTTP template (#129)
  • psiphon: make sure data format matches old implementation (#128)
  • psiphon: make sure we're using latest version of dependencies (#141)
  • orchestra: implement authenticated requests (#148)
  • psiphon: fetch config via authenticated orchestra (#109)
  • psiphon: use config from orchestra (#167)
  • orchestra: MK compatible platform names (#169)
  • orchestra: store login and tokens on persistent storage (#164)
  • psiphon: follow-up of fetching config from orchestra (#168)

dash: rewrite using github.com/neubot/dash

I've recently written a Go implementation of Neubot's DASH test at github.com/neubot/dash. This issue is about performing the following tasks:

  • replace the current implementation with code calling github.com/neubot/dash directly

  • make sure that the resulting JSON is consistent with what you obtain using MK and, if not, take the proper actions to make sure it becomes consistent

SNI support for TLS handshake

@darkk commented on Wed Aug 17 2016

Sometimes https traffic is blocked on SNI basis, so I've written some code to test that.

It should be probably merged after 2.0.0 release or rebased onto 2.0.


@hellais commented on Tue Aug 30 2016

Minor comments to this test. I will put the merge for this on hold until we get the 2.0.0 branch merged into master.

Good work 👍


@hellais commented on Tue Sep 19 2017

I think it would be cool to land this into master. It also has a useful fix to NetTest class that at minimum should make it's way into master.

Given the fact tls_handshake is experimental anyways, I don't see any harm in making experimental changes in it.

It needs to be rebased though.

refactor(session): setAvailable... => addAvailable

In session/session.go there is currently a bunch of functions like SetAvailableHTTPSCollector that should be better named AddAvailableHTTPSCollector. In fact, set hints that we're replacing the old value, whereas add hints that we're appending to the existing values list.

Implement probe orchestration

We need to have a complete implementation of probe orchestration in here. For now, we just have support for getting the test lists. Here's what remains to be done:

  • implement the register API
  • implement the login API
  • implement the update API
  • implement the task related APIs
  • implement the API expected by mobile apps (see mkall-ios, android-libs)
  • remove the orchestra implementation from MK

Note that we can safely remove the orchestra implementation from MK because it is marked as an internal API. Therefore, we can do that without violating any API constraint.

Separate measurement and interpretation

From measurement-kit/measurement-kit#1712:

See measurement-kit/measurement-kit#1704 (comment)

See also this message of mine on Slack, which elicited lots of positive feedback:

  1. we can refrain from changing the code that decides whether something is blocked for now, but we should change that based on mining data in the future. I do not like that our data format does not clearly distinguish between (a) this is what I did measure and (b) this is what I believe what I measured means. This is unfortunate. Our data format should clearly distinguish between facts, i.e. what was measured, and inferences based on such facts.

Group geoiplookup info and make it more robust

  1. the geoiplookup is a single operation in ooni/probe-cli; considering that we're not doing this work in the MK context anymore, we can be more relaxed wrt API guarantees and use the same pattern used by ooni/probe-cli, therefore making code integration simpler;

  2. two out of three services we use already return us the country and the ISP name, therefore it seems a bit backwards to discard this information.

Proposal: while grouping geoiplookup into a single operation, write the code in a way that allows us to stop early if we already have enough information to return to the user.

Process: implement this functionality in the simplest possible way and see during beta testing whether it is satisfactory for users. In case it's not, change algorithm before release.

Implement and use IP scrubbing safety net

Rationale

In MK we traditionally scrub the IP manually from the results JSON for each field where to do so. We plan on doing something similar for Go-native tests. This issue is about creating and implementing a different procedure for achieving the same result that:

  1. takes in input a []byte (i.e. the serialised JSON) and a string containing the IP address

  2. makes sure the string actually contains an IP address (using ParseIP)

  3. scrubs the IP from the []byte and return the modified []byte

We want to perform IP scrubbing right before sending the JSON to the OONI collector and storing the JSON to disk for the following reasons:

  1. safety: if we add another test (or we perform changes) we don’t need to think whether to scrub fields because that’s already handled in the line of code just before submitting and saving the JSON

  2. inspectability: the concern is addressed close to where it matters and it’s easier to assess correctness by inspection

I do not expect regressions in terms of performance. However, if don't also perform scrubbing at a more granular level, it seems that may have excessive memory usage, as explained below.

If we only perform scrubbing on the serialised JSON, the memory usage is going to be large in some corner cases. The Go implementation of Replace allocates a new large enough slice to write the replacement into. Therefore the memory cost of implementing this operation on the bytes of the JSON is the extra cost caused by allocating and copying the extra bytes that are not part of the body and of the headers. It should also be mentioned that, if the JSON does not contain the IP address, then the memory and copy cost is actually zero.

This suggests that we should scrub in advance all the fields that we can scrub, e.g. the ProbeIP field inside the outmost JSON structure, and all other fields that we have identified already as problematic (e.g. HTTP headers and body). Not doing that is probably not smart because it means that, if we measure a large HTTP binary body and, say, we don't scrub the ProbeIP in the outmost structure, we need to allocate twice the size for no good reason and we could have avoided that easily by just changing the value of a small field. Considering that we want this code to run on mobile phone, I'd rather have this as a safety net than as the only scrubbing mechanism.

List of tasks

  • understand more in depth what doing this kind of replacement means to address potential concerns regarding security and regarding breaking the JSON

  • make sure that we can pass to MK a flag controlling whether it should or should not perform its own scrubbing procedure (we currently disable scrubbing, as I initially though I could only do it top-level, now I see how this can be problematic)

  • make sure we use this flag properly in miniooni

  • implement the above algorithm

  • write unit tests for it

  • expose this scrubbing functionality using an API and make sure that the API returns some actionable signal that we can use to inform ourself that the more specific scrubbing mechanism is not working so we're allocating memory la la la

  • use this API inside of miniooni

  • adapt ooni/probe-cli#46 to use this API

Framework for follow-up experiments

Should we retry measurements directly after they fail? Does this mean a single experiment may return more than a single measurement? What other designs are possible?

dash: write tests and reach 100% coverage

We've written a new DASH experiment in Go in #61. Now we need to get confidence that it is working as intended. To this end, we need to write tests to cover all code paths as well as to make sure that we're emitting the intended summary stats given specific interim measurements.

Once this stage is reached, we'd have gained confidence that the DASH experiment is production ready, and we can replace the C++ implementation with the Go one.

Implement SNI filtering measurements

See measurement-kit/measurement-kit#1757

Original comment:

From the technical point of view, it's relatively easy to automate the manual measurements that I helped to perform in a few upcoming and past OONI reports using cURL. In fact, we have mkcurl as part of MK for quite some time now. The most complex part is actually to decide whether to do this as part of the Web Connectivity test or as another test. My initial thought is that we should detect whether we have a failure in the TLS handshake and then, if that is the case, (offer the user the possibility to) perform a follow-up measurement. However, this road is made complex by the amount of code that we would need to touch and adapt, so it's probably a bad idea. A new test, instead, would take just two days.

miniooni: improve logging

  1. display minimal progress on stdout

  2. disable logging on stderr when -l LOGFILE is used

  3. add timings to all logs

Add low level events, metrics to reports

Original issue: measurement-kit/measurement-kit#1720

Original text:

This comment of mine on Slack was 👍-ed by @hellais. It probably calls for creating a PR in the ooni/spec repository (or probably a couple of them):

I believe we should stop talking about logs. We should include network events, DNS events, TLS events, and HTTP events as steady-clock annotated events into the report. This is basically to say that our plan to collect more low level data replaces the free format logs with events having a more predictable format that makes reports more precise and more actionable to understand errors.

Of course, one critical bit of information to include is timing; see also:

We should also consider including TCPInfo metrics and changing the congestion control:

We should collect infra-request timing and possibly use HAR:

Use github.com/ooni/netx: 1/2

We should use github.com/ooni/netx inside the httpx package of this repo:

  • to dump TLS certificates and TLS state (was #17)

  • to use custom DNS resolvers (was #18) and transports

  • to remove custom HTTP code from the tree

For now, this is just a reminder of work to be done in the future.

Support cloud fronted OONI services

We wanted to support them in MK. We never finished the work. It seems now most of them have been discontinued. Unclear whether we really need this functionality.

Increase code coverage to 66%

As part of our first public release we want to bump the code coverage. Aiming for 100% seems premature given that lots of stuff is still in flux. 75% seems a good compromise.

Strategy to separate Go and C code

We have #41 and #61 that replace C/C++ code with Go code. The proper way of structuring such PRs is to have a file named foo/foo_cgo.go where we keep the current code and a file named foo/foo_otherwise.go where we put the new code. This way, we can keep both implementations around until we are ready to dispose of the MK dependency.

Go mobile bindings

This will allow us to expose to mobile apps the enhanced capabilities of this engine, including more robust network communications, ndt7, and Psiphon.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.