Comments (14)
I am wondering how we will know if the person should be notified. This algorithm is still in progress and we do not know how it will work - should we notify people who were in x radius and spent x time next to the person who has coronavirus? It is really important in terms of transparency I think. Or maybe I missed something, correct me if I am wrong.
I think there should be a team of data scientists, doctors, and technologists working on it. I do not know exactly what will work but I have a strong believe it will. That's why I also thing keeping FULLY ANONYMOUS data on the server and running/testing various algorithms there is the best approach. I've been involved in Machine Learning / Data Science projects (I worked at https://nomagic.ai - robotics and AI startup) and I know that such algorithms require a lot of iterations, testing etc.
Having a fully anonymous BT encounters data on the server + information who is diagnosed link to information about it (without de-anonymising even the sick people) should provide a good test/verification data for that. That's why I think keeping some data in a central location might make sense (as long as it is not de-anonymisable). The great thing about it that in order to train/try such algorithms we do not have to know at all who is who. We just have to know that given IDs have been diagnosed and learn the spread pattern from there and fine-tune the algorithms. Completely anonymously.
If AI/Machine learning is involved (I will try to involve some of the best specialists I know from NoMagic) then we will not know the details of such algorithms anyway. The AI algorihms are so far mostly "black boxes" that are not easily explainable, however they might be tested and verified on real anonymous data (and when you run it for the historical data, you might verify that your algorithms provide results correlating with reality so they might predict risk much better than any "well described" algorithm.
But again - I am not specialist in this area - what I would do is to provide anonymous data to the people who know what they are doing and let them work with it.
from documents.
@jasisz Thanks for clarifying this issue. I think we are well aware in the DP-3T project of the issue you highlight, and indeed it is discussed in the whitepaper, see Section 5.3 beginning on page 27. We consider it unavoidable in a system of the type we are aiming for. Proposals for concrete ways of addressing it that we may have missed are positively encouraged.
from documents.
Couldn't this be solved by using a sort of bloom filter / spatio-temporal bloom filter to represent the infected ids?
By sharing with the server only a bloom filter representation of the infected ids, operations such as set union/intersection are still feasible, false-positive rates can be mitigated by selecting the right filter size, and exact id values can't be retrieved directly from the filter. The single encounter problem might still exist, but it might be mitigated using a filter size that ensures a good balance between small false positive rates and good privacy guarantees.
from documents.
@panisson Clever use of Bloom filters would make it easier to publish ephids, this is a good idea, but functionally it is what I meant as collisions.
I just wonder why they stated in docs that this is not possible for any proximity tracing mechanism
while it clearly is if you sacrifice same false-positives for it.
from documents.
I think that's the risk of the decentralized approach where everything is decentralized. There is a lot of debate whether centralized or decentralized approach is better. I think a hybrid (ok maybe just centralized - depending how you understand it) approach where most of the information gathering and exchange happens on the phones and only some centralized data is still stored (fully anonymously) makes it much less vulnerable to single-encounter case and it has an additional benefit of algorithm testability.
I think we can modify the whole approach and the problem can be solved by centralizing algorithm of detection if a person is "endangered" or not. Taking bits and pieces of the ProteGO app implementation (and discussion - in Polish) ProteGO-Safe/specs#34
Some thoughts:
-
Each country anyhow has the database of "positive COVID-19" cases. This should be centralized, including personal data of those people. GDPR protection, medical data security - everything should apply to those cases. Privacy of those people for state is not a concern (as long it is sufficiently protected by those regulations). So I personally see no problem to ask the sick people to submit their history to central server - providing that they are not "involuntarily" submitting other's people "private" data.
-
When person is diagnosed COVID-positive - the data from that person (and only that person's encounters can be submitted to the central server with appropriate protection - signed with a code delivered during the diagnosis via QR-code/SMS/Phone call). There should be no way to gather the identity of those people who the sick person had encounters with.
-
Algorithms on the server (open and transparent) could analyse the data and mark potentially (fully anonymous) IDs of people who are endangered.
-
Application might simply query for it's own IDs to determine it's status - using single Seed query (the compromise). Those queries do not have to be frequent - once a day is more than enough which with 500M people makes 6000 calls/s. A lot, but quite possible with modern cloud technology.
Of course it means that the algorithm might be manipulated potentially. However since there is full anonymity of the IDs, I think that might be a nice compromise (Like - you do not know whom you would like to compromise if you are a bad actor on the server side). On the other hand it has the benefit that the algorithm might be tested. Before deploying new algorithms it will be possibly to make a dry run on existing data and see if there is an error. Making mistake in such algorithms might be potentially disastrous, i think being able to modify and tests the algorithms on the server.
WDYT ?
from documents.
When person is diagnosed COVID-positive - the data from that person (and only that person's encounters can be submitted to the central server with appropriate protection - signed with a code delivered during the diagnosis via QR-code/SMS/Phone call). There should be no way to gather the identity of those people who the sick person had encounters with.
I'm pretty sure that the code verification while sending the history will be implemented in the app. TBC by @jakublipinski though.
I am wondering how we will know if the person should be notified. This algorithm is still in progress and we do not know how it will work - should we notify people who were in x radius and spent x time next to the person who has coronavirus? It is really important in terms of transparency I think. Or maybe I missed something, correct me if I am wrong.
from documents.
@potiuk Providing such data for the research is one thing, but deciding if you are at risk (as now proposed in ProteGO) is another. It can't be a black box to ensure trust.
from documents.
Let's wait and see how it evolves. I think it would be great to see the algorithm but for me it is not a blocker if the data it operates is truly anonymous. Maybe I am wrong here, but I do not see risk (at least from the point of view of "infiltration", "manipulation" and "preserving privacy" - which was my main concern before for ProteGO. Assuming (and this is still big if - we have to observe and shout if not) the data will be anonymous there might be other risks involved that might make the need for the algorithm to be public.
I believe for now we do not even have enough data to make any assumptions about the algorithm, its accuracy, correctness - because ... it does not exist yet (no data - > no algorithm). This algorithm will have to be worked out by data scientists not software engineers and I think it might be really complex to verify. But .. Let's see what information ProteGO provides. I think at this moment it is important that it should be anonymous by default (and you should only optionally add number), It should be opt-in not opt-out. Let's see what UX it will be..
from documents.
And I hope the algorithm will be made public eventually.
from documents.
@cloudyfuel > agree pseoudonymous != anonymous. and I think at this step it is important to fight for anonymity. I think algorithms should come next in line.
from documents.
The documents in this repo mention this case as an remaining risk that cannot me mitigated. Even without an app, having limited contact helps to pinpoint the source of infection. I don't see how this can be solved with technology.
from documents.
@nicorikken The problem is that an attacker can simulate having limited contact by changing his own identity frequently.
Maybe this is inevitable part of any proximity tracking algorithm of this kind as stated in the original doc, but I believe if we allow system to have same false positives we can at least say that there is some chance it was not a true contact (and therefore not a true source of an infection), but it was just a false positive. This may not be enough though and there might be some better ideas.
from documents.
Hi all, thanks for your very interesting inputs! The thread goes in many directions so please bear with me.
I'll try to summarize:
Initial problem: Single encounter problem.
(please start your comment with this if you're answering this point)
This is indeed a valid concern, which I believe is not solvable in the most extreme case:
Alice and Bob live in the same home, both run the app, Alice goes to work and gets infected, does the upload. Bob learns the 1-bit "you're at risk" but never left the flat: he can infer that Alice infected him.
Even without considering such a dummy case, we believe that false positives (@jasisz first proposal) ultimately cannot prevent the attack, but they do add uncertainty to the attacker (Bob in this case). One counter-arguments is that false positives are undesirable for overall utility the system.
@jasisz, I'm sorry but I'm not sure I understand your second proposition :
The user with single (or low number) risky encounter during an epoch remembers an encountered id and then may re-use it in later epoch. This way it is not trivial to know who exactly was infected. Still only persons at risk would be alarmed.
Bloom filters
discussed in #24 and in our new design (see Whitepaper), if we could move the discussion to #24 it would be great.
3rd point: Centralized algorithm
(please start your comment with this if you're answering this point).
Discussion about having running the algorithms on anonymous data on the server:
Thanks for the many interesting comments on the topic. It is obviously a broad topic, just perhaps to highlight some of our past decisions: we decided that it was very hard to truly anonymize uploaded "graph" data; this is why our design uploads infected identities and not contacts (also see our FAQ, P1), hence the design we propose. Another comment is that even if it is possible, truly anonymizing uploads is costly (requires an anonymous communication network), see our FAQ P5, hence our design that avoid this by only uploading non/less-sensitive data to the backend.
from documents.
@lbarman I could've made it more clear. There is a possibility that app would re-use some EphIDs it have seen in the past and advertise with them by design. It does not fit into your Design 1, but somehow fits into Design 2.
Alice has seen EphID-B belonging to Bob and it was long enough time that it is a valid contact, potentially an infectious one. Alice can present herself with EphID-B in some next epochs. In case EphID-B is reported as infected it is not clear if it was Alice own EphID or it belongs to someone Alice has seen in the past.
Of course it also leads to false-positives of two kinds:
- Alice was infected right after seeing Bob and people who have seen Bob would be false positives
- Alice was not really infected (but at some serious risk from meeting Bob) but we notify people one-step away from real infection, those who in fact have only met Alice
from documents.
Related Issues (20)
- Was DP-3T Exposure Calculation.pdf Android only? HOT 1
- Stability of distance estimation in case of using a bluetooth Extender HOT 2
- [Public Engagement] Visual Explainer / Scrollytelling on Privacy Preserving Proximity Tracing
- Mistake in communicating how information is passed around, in CH implementations of the apps HOT 3
- Reproducibility of Figure 1 in "DP3T - Exposure Score Calculation.pdf" HOT 3
- Risk calculation when exposed to multiple infectors both for < 15 min. HOT 7
- Naive secret sharing would allow for "jamming" on a non-physical level
- Why did the SwissCovid team not disclose the existence of the LASEC report? HOT 15
- Add support for multiple epidemics HOT 1
- [DOCUMENTATION] FAQ on Apple/Google framework issues HOT 1
- App feature request: Show stored app data as visualization of contact events HOT 3
- Schedule for F-Droid (and/or direct download) release of the Android app HOT 3
- [DOCUMENTATION] Cartoon, Dutch version, one pager: wrong text in picture 6. HOT 1
- Smartwatch App - Market Analysis (WearOS, WatchOS, Fitbit OS and Garmin Watch OS) and way forward HOT 1
- Who controls the 0xFD68 Bluetooth UUID?
- Potential privacy issue of new Exposure Notifications Express? HOT 3
- Wrong text on panel 6 of the NL onepage graphic HOT 1
- Update French onepage translation HOT 2
- Would like to understand the time window for notification
- Question: Where can I find the BLE MAC randomization code in DP^3T?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from documents.