noblis / invsc-janus Goto Github PK

View Code? Open in Web Editor NEW

9.0 9.0 16.0 39.59 MB

IARPA Janus Program API

License: Other

CMake 3.84% Shell 5.58% Python 7.08% C++ 41.97% C 41.52%

invsc-janus's People

Contributors

Stargazers

Watchers

Forkers

vmorariu apprisi umdjanus stephenrawls dravigon edwardt nxwhite-str keeganren aduncan304 ilovecv zhu-gl milestonesvn ericsonl peiyunh lanqiming

invsc-janus's Issues

Gallery insert and remove

We're concerned about the runtime performance and gallery size impact of janus_gallery_insert and janus_gallery_remove. This form of incremental gallery construction may require that a gallery is recreated from scratch every time insert or remove is called. Furthermore, even if there is a call such as "janus_prepare_gallery" as in issue #31, performers may pay one of the following penalties. 1) significantly increased gallery size due to keeping additional media or encodings for each enrolled template necessary to re-optimize the gallery for search, or 2) reduced search performance from not having the data necessary for the best gallery representation.

We understand from discussions with IARPA that the API will be used for research purposes only, and will not be targeted for operational use. Batch gallery construction will always be better than (or equal to) incremental construction, although incremental construction would certainly be useful for operational deployments.

So, our question is will this incremental vs. batch gallery construction performance be studied by NIST, and is this a research question that the API should be designed to answer?

If not, we recommend removing these API calls, and having a gallery construction be a batch operation that constructs an immutable gallery. For our team, this will result in greatly reduced gallery sizes and much more straightforward book-keeping for gallery construction.

Reorganization of branches

Hello all,

There has been some confusion recently about which branch of this API Janus performers should be conforming to. To alleviate that confusion I am proposing a new organization for the branches here on Github and also for the documentation at libjanus.org.

The reorganization will be as follows. The master branch will become the stable branch that everyone should be compatible with. A new branch (will be called phase 2 at the moment) will contain API changes and updates planned for the next phase of the program. When the program moves into the next phase that branch will be merged into master and a new branch will be made to accept new changes for the following phase of the program. Only critical bug fixes will ever be merged into master once it is declared stable. We will also maintain an archive of the previous versions of the API under appropriately labelled branches.

The documentation at libjanus.org will be automatically generated using doxygen from the master branch here. At some point in the future we would like to add functionality to the page that allows you to select which branch's documentation you would like to view (and indicates which branch you are currently viewing). Until you see that please assume the documentation is for the master branch only!

I am planning on making these changes next Wednesday (Nov 4) unless there are issues. If you do have issues please respond in this thread so that we can all discuss them.

-Jordan

High level API issues

Maryland is concerned about the following high level issues with respect to the Phase 2 API:

As there will not be any training data in CS3 and no support for tuning in the API: it will not be possible for the Government to run IJB-A with the phase 2 deliverable. IJB-A requires set_tuning_data. The phase 2 deliverable will not have set_tuning_data. It is difficult to maintain 2 distinct recognition deliverables, one with phase 1 API and one with phase 2 API. In practical terms, the historical record of results on IJB-A will end effectively when CS3 is released.

We would like to know more about the “detection_confidence” We think that both detection confidence and tracking confidence should be used . We are required to provide a single detection score for a track, whereas we have scores corresponding to faces in every frame in the track. It is difficult to distinguish detection confidence between two frames of same video which would be an issue during face detection evaluation.

We think that both detection confidence and tracking confidence are important for indicating the quality of a face track and should be reported/required. They are different things and have different meanings.

The output of a clustering algorithm is a pair: cluster_id, cluster_confidence required for janus_track. cluster_confidence also needs clarification. A precision-recall curve cannot be built by thresholding cluster confidences, if the cluster ceases to exist the data items need to be assigned to a different cluster, they cannot be entirely dropped.

Supervised Clustering: In this case, we are supposed to cluster a collection of unlabelled people into distinct identities. We will be provided with bounding boxes of those people and there is a hint parameter which would tell us about approximate number of clusters. The final result should be an approximate estimate (closest multiple of 10). We feel that assuming that we have (or almost have) K is impractical. We would like to suggest the lowest similarity between two descriptors that need to be clustered together.

It seems from CS3 line items that the same line item needs to be processed many times. The face detection and alignment of a given line item will not change. And will be processed many many times. This is wasteful. This was important for CS2 but even more so for CS3. We tried to provide caching of results around Redis and it was not very well received. I think we really need guidance as to caching results. Or a protocol design that will not evaluate the augmentation of the same line items over and over again. This will avoid running the same face detection on the same stills and frames and will be a big win both for the evaluators and also for us as we will need to test the algorithm(s) many many times.

memory leak

I think you are leaking one complete media each time you go through this iteration, which in an entire detection run will leak about 77K images or > 200GB

https://github.com/biometrics/janus/blob/phase2/src/janus_io.cpp#L123-L160

Question about videos in janus_media

Hi,

I was looking at the janus_media object, and had a concern that it could cause memory issues. Here is the definition of janus_media from iarpa_janus.h:

typedef struct janus_media
{
    std::vector<janus_data*> data; /*! < \brief A collection of image data of size N,
                                                where is the number of frames in a video
                                                or 1 in the case of a still image. */
    size_t width;     /*!< \brief Column count in pixels. */
    size_t height;    /*!< \brief Row count in pixels. */
    size_t step;      /*!< \brief Bytes per frame, including padding. */
    janus_color_space color_space; /*!< \brief Arrangement of #data. */
} janus_media;

As you can see, it represents a video as a vector of frames, completely uncompressed. That is, the total required memory for a single video is: w*h*nChannels*nFrames. Furthermore, when constructing templates, the API forces all videos for a template to be loaded into memory simultaneously.

It seems that if a lot of videos were processed then memory would quickly become an issue. Especially considering that clients are probably trying to create multiple templates in parallel on the same machine.

To put a hard-number on it, take the following video file from CS2: CS2/video_clips/35756.mp4. It has 720 frames, and each frome is 1280x720 pixels. So the amount of memory it requires to load fully uncompressed into memory is 1280*720*3/10^9 = 2.07 GB. It only takes a couple video files like that per subject to really start constraining how many templates you can process in parallel on a single machine, with memory likely to be the bottleneck before CPU or GPU compute resources are exhausted.

It might not be an issue for the evaluation (e.g. NIST has machines with a heck of a lot of RAM), but it seems like a usability issue for future government users.

Thanks,
Stephen Rawls (ISI)

Using Gallery to Improve Model

Hi,

One thing that many Janus teams are doing is using the gallery to improve their face recognition models. (e.g. training some type of model adaptation or unsupervised dimensionality reduction using gallery data).

In the Phase 1 API this was done in the "janus_flatten_gallery()" function call. There is no corresponding function call in the current API.

It seems to me it would be useful to have a janus_prepare_gallery() function call that would be optional. If an implementation makes use of the gallery data in any way, they can have a check in their janus_search() implementations to determine if the gallery has been prepared or not prior to beginning the search, and prepare it if necessary.

The purpose of having an explicit (optional) janus_prepare_gallery() function is for clients to call it when they know they are done adding things to the gallery, and perform the required operations at that time as opposed to the time of the first search, where an end-user might be surprised that some searches take a much longer amount of time than normal.

Thanks,
Stephen Rawls (ISI)

janus_track and janus_association

Hi,

I have a question about the intended use of janus_track and janus_association objects.

From skimming over the API, it looks like:

A call to janus_detect() outputs a vector of janus_track objects, essentially one janus_track object per detected subject. Each of the janus_track objects are themselves vectors of janus_attributes structures, essentially one janus_attributes object per frame in which the subject was detected.
A janus_association object contains a reference to a media object, along with a vector of janus_track objects. The comment on janus_association says that each of the janus_track objects refers to the same subject.

It's unclear to me why a janus_association object isn't just a struct containing a reference to a media object along with a single janus_track object. That single janus_track object would contain a vector of janus_attributes objects, one for each frame in the associated media that the subject appears in.

Can you explain the intended use of these objects, and what do different janus_track entries inside a single janus_association object represent?

Thanks,
Stephen Rawls (ISI)

Tracking in videos

Tracking subjects in videos will have the best performance when all frames of a video are available for detection and tracking. I believe it has been stated previously the intent was for the test harness to pass all frames for video media to janus_detect (and the second versions of janus_create_template and janus_cluster), rather than just the key frames as was done for Phase I.

Is that correct? If so, I would suggest updating the documentation for janus_media to make this part of the API contract.

set_tuning_data

Hello,

I'll have a more complete set of comments soon. But... what happened to set_tuning_data in the phase2 API?

Best,

Carlos

janus_create_template

In the interest of keeping the API as simple as possible, it seems the second definition of janus_create_template, taking a single media object and returning collections of templates and tracks, is unnecessary. The intent of that method seems to be to remove the ground truth association out of the processing and have the algorithm decide how many individual subjects there are in the media. If I'm understanding that correctly, it seems the user could get that same behavior with the calls to janus_detect and the first definition of janus_create_template, and that logic seems more appropriate to implement in the test harness rather than a separate API call that each algorithm needs to implement. This pseudo-code illustrates what I think is the requested behavior:

vector<janus_template> templates;
vector<janus_track> tracks;

janus_media media;
janus_load_media(filename, media);

janus_detect(media, min_face_size, tracks);
for (auto& track: tracks) {
    // assumes janus_association no longer holds vector of tracks per issue #27
    janus_association assoc = { media, track };
    vector<janus_association> assocs =  { assoc };
    janus_template template_;
    janus_create_template(assocs, role, template_);
    templates.push_back(template_);
}

Finally, the second definition of "janus_create_template" in fact creates a collection of templates, and it would be more precise to call this "janus_create_all_templates". One may argue that creating templates from media directly rather than janus_associations allows for some sort of within media tracking and clustering. However, would not this improved functionality already be included in "janus_detect", since "janus_detect" has the same inputs as the second definition of "janus_create_template"?

So, given that it appears that second definition of "janus_create_template" can be decomposed into atomic API calls by the test harness, we recommend that it be removed.

Change to janus_cluster signature

Hello all,

This is related to issue #29 brought up by the ISI team. They made the excellent point that long, uncompressed streams of HD video occupy a significant amount of RAM (> 25GB in the worst case in CS3). I am making an assumption that we all have machines with enough RAM to handle creating templates even if they contain 2 or 3 videos of this size. For that reason I am not updating the janus_create_templates function at this time.

The high memory requirement poses a significant issue for clustering however. Specifically, the input of clustering is a collection of unlabeled media. If that collection has a significant number of videos the RAM requirements quickly become ridiculous. To mitigate this, I have update the janus_cluster function to take a collection of templates instead of raw media. The templates will be created from a single piece of media (image or video). Depending on the evaluation protocol bounding boxes may or may not be provided along with the media. This change also has the added advantage of removing an API function (there were 2 versions of janus_cluster previously) and reducing repeated computation, as detection and template creation can now be decoupled from clustering.

The code for this change can be seen in PR #36. Please respond with comments or concerns. Additionally, the first comment period was scheduled to close today. Because of this change and the recent issues raised by the UMD team I am going to extend that by a week to June 3rd.

janus_template_role question

Hi,

I have a question about the intended use of janus_template_role.

First, I am repeating all the possible values of this enumeration:
ENROLLMENT_11, VERIFICATION_11, ENROLLMENT_1N, IDENTIFICATION, CLUSTERING

I assume I know what VERIFICATION_11 and ENROLLMENT_1N mean (in the first case, the template will be used for the one-to-one verification protocol; in the second case, the template will be used for the one-to-n search protocol, and it is being enrolled to the gallery). I am guessing that IDENTIFICATION means that the template will be used in the one-to-n search protocol and will be used as a probe? I am unsure what ENROLLMENT_11 would mean. Can you confirm my previous descriptions, and also explain ENROLLMENT_11?
Does the presence of these enrollment types mean that we can assume the test harness will call create_template() separately for the identification protocol and the verification protocol for templates that happen to be included in both protocols? If not, it is unclear what these template_role hints can be used for, if there is no guarantee that the test harness will pass them to us. If so, it seems that the test harness would be repeating work unnecessarily unless an implementation actually made use of these hints.
The presence of this comment in the janus_create_template() function seems to indicate that we are prohibited from actually making use of these hints. (I'm unsure what use we would make out of them, I'm just trying to understand their purpose in the API)

All media necessary to build a complete template will be passed in at
one time and the constructed template is expected to be suitable for
verification and search.

If all templates we create are supposed to be suitable for both verification and search (which seems reasonable to me, by the way), that seems to imply that we aren't allowed to create different templates depending on the value of janus_template_role.

Can you explain the intended use of janus_template_role?

Thanks,
Stephen Rawls (ISI)

memory issues (janus_create_gallery and janus_serialize_gallery)

Hi,

I have another concern about memory consumption. Again, this might not be an issue for the evaluation (because NIST has large machines), but it seems to me like it is an issue for future government customers, especially given the large number of subjects we eventually hope to address by phase 3.

Several current API calls require that a gallery is copied into memory twice, thus doubling the total memory requirements of the program. This happens in two places:

The janus_serialize_gallery() and janus_deserialize_gallery() functions. This could be fixed by having these functions write/read directly to files.
The janus_create_gallery() function. This one is bit harder to fix under the current API design.

At the last PI meeting, we briefly talked about the competing interests of optimizing an API for evaluation VS optimizing an API for actual users, so maybe the answer here is that we simply don't care about the memory usage right now and this API is only meant for evaluation.

Just thought I'd see what people thought.

Thanks,
Stephen Rawls (ISI)

ClipID to organize videos

@jklontz There is currently no information in the metadata to distinguish whether or not different instances belong to the same video clip. I had discussed this before with @bhklein and the idea was to add a ClipId that would be -1 for images and a unique number for each video. Something we did not discuss was how to distinguish frame number within the clips as well. Certain performer methods rely on this temporal information. Let me know what you think is the best way to proceed.

noblis / invsc-janus Goto Github PK

invsc-janus's People

Contributors

Stargazers

Watchers

Forkers

invsc-janus's Issues

Recommend Projects

Recommend Topics

Recommend Org