Code Monkey home page Code Monkey logo

Comments (14)

potiuk avatar potiuk commented on September 24, 2024 2

I am wondering how we will know if the person should be notified. This algorithm is still in progress and we do not know how it will work - should we notify people who were in x radius and spent x time next to the person who has coronavirus? It is really important in terms of transparency I think. Or maybe I missed something, correct me if I am wrong.

I think there should be a team of data scientists, doctors, and technologists working on it. I do not know exactly what will work but I have a strong believe it will. That's why I also thing keeping FULLY ANONYMOUS data on the server and running/testing various algorithms there is the best approach. I've been involved in Machine Learning / Data Science projects (I worked at https://nomagic.ai - robotics and AI startup) and I know that such algorithms require a lot of iterations, testing etc.

Having a fully anonymous BT encounters data on the server + information who is diagnosed link to information about it (without de-anonymising even the sick people) should provide a good test/verification data for that. That's why I think keeping some data in a central location might make sense (as long as it is not de-anonymisable). The great thing about it that in order to train/try such algorithms we do not have to know at all who is who. We just have to know that given IDs have been diagnosed and learn the spread pattern from there and fine-tune the algorithms. Completely anonymously.

If AI/Machine learning is involved (I will try to involve some of the best specialists I know from NoMagic) then we will not know the details of such algorithms anyway. The AI algorihms are so far mostly "black boxes" that are not easily explainable, however they might be tested and verified on real anonymous data (and when you run it for the historical data, you might verify that your algorithms provide results correlating with reality so they might predict risk much better than any "well described" algorithm.

But again - I am not specialist in this area - what I would do is to provide anonymous data to the people who know what they are doing and let them work with it.

from documents.

kennypaterson avatar kennypaterson commented on September 24, 2024 1

@jasisz Thanks for clarifying this issue. I think we are well aware in the DP-3T project of the issue you highlight, and indeed it is discussed in the whitepaper, see Section 5.3 beginning on page 27. We consider it unavoidable in a system of the type we are aiming for. Proposals for concrete ways of addressing it that we may have missed are positively encouraged.

from documents.

panisson avatar panisson commented on September 24, 2024

Couldn't this be solved by using a sort of bloom filter / spatio-temporal bloom filter to represent the infected ids?
By sharing with the server only a bloom filter representation of the infected ids, operations such as set union/intersection are still feasible, false-positive rates can be mitigated by selecting the right filter size, and exact id values can't be retrieved directly from the filter. The single encounter problem might still exist, but it might be mitigated using a filter size that ensures a good balance between small false positive rates and good privacy guarantees.

from documents.

jasisz avatar jasisz commented on September 24, 2024

@panisson Clever use of Bloom filters would make it easier to publish ephids, this is a good idea, but functionally it is what I meant as collisions.
I just wonder why they stated in docs that this is not possible for any proximity tracing mechanism while it clearly is if you sacrifice same false-positives for it.

from documents.

potiuk avatar potiuk commented on September 24, 2024

I think that's the risk of the decentralized approach where everything is decentralized. There is a lot of debate whether centralized or decentralized approach is better. I think a hybrid (ok maybe just centralized - depending how you understand it) approach where most of the information gathering and exchange happens on the phones and only some centralized data is still stored (fully anonymously) makes it much less vulnerable to single-encounter case and it has an additional benefit of algorithm testability.

I think we can modify the whole approach and the problem can be solved by centralizing algorithm of detection if a person is "endangered" or not. Taking bits and pieces of the ProteGO app implementation (and discussion - in Polish) ProteGO-Safe/specs#34

Some thoughts:

  1. Each country anyhow has the database of "positive COVID-19" cases. This should be centralized, including personal data of those people. GDPR protection, medical data security - everything should apply to those cases. Privacy of those people for state is not a concern (as long it is sufficiently protected by those regulations). So I personally see no problem to ask the sick people to submit their history to central server - providing that they are not "involuntarily" submitting other's people "private" data.

  2. When person is diagnosed COVID-positive - the data from that person (and only that person's encounters can be submitted to the central server with appropriate protection - signed with a code delivered during the diagnosis via QR-code/SMS/Phone call). There should be no way to gather the identity of those people who the sick person had encounters with.

  3. Algorithms on the server (open and transparent) could analyse the data and mark potentially (fully anonymous) IDs of people who are endangered.

  4. Application might simply query for it's own IDs to determine it's status - using single Seed query (the compromise). Those queries do not have to be frequent - once a day is more than enough which with 500M people makes 6000 calls/s. A lot, but quite possible with modern cloud technology.

Of course it means that the algorithm might be manipulated potentially. However since there is full anonymity of the IDs, I think that might be a nice compromise (Like - you do not know whom you would like to compromise if you are a bad actor on the server side). On the other hand it has the benefit that the algorithm might be tested. Before deploying new algorithms it will be possibly to make a dry run on existing data and see if there is an error. Making mistake in such algorithms might be potentially disastrous, i think being able to modify and tests the algorithms on the server.

WDYT ?

from documents.

kasiazjc avatar kasiazjc commented on September 24, 2024

When person is diagnosed COVID-positive - the data from that person (and only that person's encounters can be submitted to the central server with appropriate protection - signed with a code delivered during the diagnosis via QR-code/SMS/Phone call). There should be no way to gather the identity of those people who the sick person had encounters with.

I'm pretty sure that the code verification while sending the history will be implemented in the app. TBC by @jakublipinski though.

I am wondering how we will know if the person should be notified. This algorithm is still in progress and we do not know how it will work - should we notify people who were in x radius and spent x time next to the person who has coronavirus? It is really important in terms of transparency I think. Or maybe I missed something, correct me if I am wrong.

from documents.

jasisz avatar jasisz commented on September 24, 2024

@potiuk Providing such data for the research is one thing, but deciding if you are at risk (as now proposed in ProteGO) is another. It can't be a black box to ensure trust.

from documents.

potiuk avatar potiuk commented on September 24, 2024

Let's wait and see how it evolves. I think it would be great to see the algorithm but for me it is not a blocker if the data it operates is truly anonymous. Maybe I am wrong here, but I do not see risk (at least from the point of view of "infiltration", "manipulation" and "preserving privacy" - which was my main concern before for ProteGO. Assuming (and this is still big if - we have to observe and shout if not) the data will be anonymous there might be other risks involved that might make the need for the algorithm to be public.

I believe for now we do not even have enough data to make any assumptions about the algorithm, its accuracy, correctness - because ... it does not exist yet (no data - > no algorithm). This algorithm will have to be worked out by data scientists not software engineers and I think it might be really complex to verify. But .. Let's see what information ProteGO provides. I think at this moment it is important that it should be anonymous by default (and you should only optionally add number), It should be opt-in not opt-out. Let's see what UX it will be..

from documents.

potiuk avatar potiuk commented on September 24, 2024

And I hope the algorithm will be made public eventually.

from documents.

potiuk avatar potiuk commented on September 24, 2024

@cloudyfuel > agree pseoudonymous != anonymous. and I think at this step it is important to fight for anonymity. I think algorithms should come next in line.

from documents.

nicorikken avatar nicorikken commented on September 24, 2024

The documents in this repo mention this case as an remaining risk that cannot me mitigated. Even without an app, having limited contact helps to pinpoint the source of infection. I don't see how this can be solved with technology.

from documents.

jasisz avatar jasisz commented on September 24, 2024

@nicorikken The problem is that an attacker can simulate having limited contact by changing his own identity frequently.

Maybe this is inevitable part of any proximity tracking algorithm of this kind as stated in the original doc, but I believe if we allow system to have same false positives we can at least say that there is some chance it was not a true contact (and therefore not a true source of an infection), but it was just a false positive. This may not be enough though and there might be some better ideas.

from documents.

lbarman avatar lbarman commented on September 24, 2024

Hi all, thanks for your very interesting inputs! The thread goes in many directions so please bear with me.

I'll try to summarize:

Initial problem: Single encounter problem.

(please start your comment with this if you're answering this point)

This is indeed a valid concern, which I believe is not solvable in the most extreme case:
Alice and Bob live in the same home, both run the app, Alice goes to work and gets infected, does the upload. Bob learns the 1-bit "you're at risk" but never left the flat: he can infer that Alice infected him.

Even without considering such a dummy case, we believe that false positives (@jasisz first proposal) ultimately cannot prevent the attack, but they do add uncertainty to the attacker (Bob in this case). One counter-arguments is that false positives are undesirable for overall utility the system.

@jasisz, I'm sorry but I'm not sure I understand your second proposition :

The user with single (or low number) risky encounter during an epoch remembers an encountered id and then may re-use it in later epoch. This way it is not trivial to know who exactly was infected. Still only persons at risk would be alarmed.

Bloom filters

discussed in #24 and in our new design (see Whitepaper), if we could move the discussion to #24 it would be great.

3rd point: Centralized algorithm

(please start your comment with this if you're answering this point).
Discussion about having running the algorithms on anonymous data on the server:

Thanks for the many interesting comments on the topic. It is obviously a broad topic, just perhaps to highlight some of our past decisions: we decided that it was very hard to truly anonymize uploaded "graph" data; this is why our design uploads infected identities and not contacts (also see our FAQ, P1), hence the design we propose. Another comment is that even if it is possible, truly anonymizing uploads is costly (requires an anonymous communication network), see our FAQ P5, hence our design that avoid this by only uploading non/less-sensitive data to the backend.

from documents.

jasisz avatar jasisz commented on September 24, 2024

@lbarman I could've made it more clear. There is a possibility that app would re-use some EphIDs it have seen in the past and advertise with them by design. It does not fit into your Design 1, but somehow fits into Design 2.

Alice has seen EphID-B belonging to Bob and it was long enough time that it is a valid contact, potentially an infectious one. Alice can present herself with EphID-B in some next epochs. In case EphID-B is reported as infected it is not clear if it was Alice own EphID or it belongs to someone Alice has seen in the past.

Of course it also leads to false-positives of two kinds:

  • Alice was infected right after seeing Bob and people who have seen Bob would be false positives
  • Alice was not really infected (but at some serious risk from meeting Bob) but we notify people one-step away from real infection, those who in fact have only met Alice

from documents.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.