( ͡° ͜ʖ ͡°)
Playing around with segment-anything, detr-resnet-101, and FastAPI. Upload an image and detr-resnet-101 identifies objects, returning confidence scores and bounding box data. For the object with the highest confidence score, segment-anything takes the bounding box data and segments the object from the overall image.
You'll need a model checkpoint placed in the root dir.
This example uses a forked version of segment-anything (with a single, minor change) to get this working with MPS.
- create a
venv
pip install -r requirements.txt
uvicorn app.main:app --reload
- make a POST request:
curl -X POST "http://localhost:8000/segment/extract_obj_with_label" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "file=@path/to/image;type=image/png" \
-F "label=person" \
| tee >(jq -r '.extracted_obj' | base64 --decode > extracted_obj.png) \
| jq '.detr_output'
- enjoy the image. print it out. frame it.
- Endpoint:
POST /segment/extract_obj_with_label
- Description: Accepts an image and label. Performs object detection and segmentation to isolate the object from the background. The segmented object is cropped and rendered with transparency in
PNG
format. - Returns: A JSON object containing:
- extracted_obj:
base64
encoded string of thePNG
with the extracted object - detr_output: An array of objects representing detected items in the screenshot. Each object includes the label, confidence score, and bounding box coordinates.
- extracted_obj:
curl -X POST "http://localhost:8000/segment/extract_obj_with_label" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "file=@path/to/image;type=image/png" \
-F "label=person" \
| tee >(jq -r '.extracted_obj' | base64 --decode > extracted_obj.png) \
| jq '.detr_output'
{
"extracted_obj": "base64_image_data...",
"detr_output": [
{
"label": "person",
"confidence": 0.98,
"box": [163.98, 97.83, 550.38, 581.16]
},
// ... more detected objects ...
]
}
The Life Aquatic with Steve Zissou. 2004. Wes Anderson.
They Live. 1988. John Carpenter.
- Endpoint:
POST /segment/overlay_mask
- Description: Accepts an image. Performs object detection and segmentation. Overlays detected object with a semi-transparent mask.
- Returns: A JSON object containing:
- image_with_mask:
base64
encoded string of the image with mask - detr_output: An array of objects representing detected items in the image. Each object includes the label, confidence score, and bounding box coordinates.
- image_with_mask:
curl -X POST "http://localhost:8000/segment/overlay_mask" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "file=@path/to/image;type=image/png" \
-F "label=person" \
| tee >(jq -r '.image_with_mask' | base64 --decode > image_with_mask.png) \
| jq '.detr_output'
{
"image_with_mask": "base64_image_data...",
"detr_output": [
{
"label": "person",
"confidence": 0.98,
"box": [163.98, 97.83, 550.38, 581.16]
},
// ... more detected objects ...
]
}
The Life Aquatic with Steve Zissou. 2004. Wes Anderson.
Run pytest
or pytest -n <amount_of_workers>
, the latter command will spawn a number of worker processes equal to the number of available CPUs, and then distribute the tests randomly across them.