Code Monkey home page Code Monkey logo

llegomark / image-classification-resnet-50 Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 39 KB

This project utilizes the Hono framework to build a Cloudflare Worker that exposes an API endpoint for image classification. It integrates with Cloudflare AI to run the Microsoft Vision Model ResNet-50 and classify images based on either image URLs or file uploads.

License: MIT License

TypeScript 100.00%
cloudflare cloudflare-ai cloudflare-workers hono honojs resnet resnet-50 resnet50

image-classification-resnet-50's Introduction

Image Classification with Microsoft Vision Model ResNet-50

The Microsoft Vision Model ResNet-50 is a powerful pretrained vision model created by the Multimedia Group at Microsoft Bing. It is a 50-layer deep convolutional neural network (CNN) trained on more than 1 million images from ImageNet. By leveraging multi-task learning and optimizing separately for four datasets, including ImageNet-22k, Microsoft COCO, and two web-supervised datasets containing 40 million image-label pairs, the model achieves state-of-the-art performance in image classification tasks.

This project utilizes the Hono framework to build a Cloudflare Worker that exposes an API endpoint for image classification. It integrates with Cloudflare AI to run the Microsoft Vision Model ResNet-50 and classify images based on either image URLs or file uploads.

Technologies Used

  • Hono: A lightweight web framework for building fast and scalable applications on Cloudflare Workers.
  • Cloudflare Workers: A serverless execution environment that allows running JavaScript and TypeScript code at the edge, close to users.
  • Cloudflare AI: A set of APIs and tools provided by Cloudflare for integrating AI capabilities into applications.

Features

  • Accepts both image URLs and file uploads for classification.
  • Validates input using Zod schema validation.
  • Supports CORS and CSRF protection middleware.
  • Implements JWT authentication middleware for secure access to the API.
  • Handles errors gracefully and returns appropriate error responses.
  • Provides an optional model parameter to specify the model for additional analysis.
    • Supported models: llama and gemma.
    • If the model parameter is not provided or is set to a value other than llama or gemma, only image classification is performed without additional analysis.

API Endpoint

  • URL: /api/classify/:model?
    • :model (optional): Specifies the model to use for additional analysis. Supported values: llama and gemma.
  • Method: POST
  • Authentication: JWT token required in the Authorization header.
  • Request Body: JSON array of image objects, each containing either a url or file property.
    • url: The URL of the image to classify (optional).
    • file: The uploaded image file to classify (optional).
  • Response: JSON object containing an array of responses for each image.
    • Each response includes:
      • classification: An array of classification results, each containing a label and a score.
      • analysis (optional): The analysis summary generated by the specified model, if a supported model is provided.

Usage

  1. Set up a Cloudflare Worker and configure the necessary environment variables:

    • AI: Your Cloudflare AI API token.
    • JWT_SECRET: The secret key used for JWT authentication.
  2. Deploy the worker code to your Cloudflare Worker.

  3. Make a POST request to the /api/classify endpoint with the following payload:

    [
    	{
    		"url": "https://example.com/image1.jpg"
    	},
    	{
    		"file": "<uploaded_file>"
    	}
    ]

    Replace <uploaded_file> with the actual file upload.

    You can also specify an optional model parameter in the URL to use a specific model for analysis. The available models are llama and gemma. If the model parameter is not provided or is set to a value other than llama or gemma, only image classification will be performed without additional analysis.

    Here are example cURL commands to classify images:

    • Classify an image using a URL:

      curl -X POST -H "Content-Type: application/json" -H "Authorization: Bearer <your-jwt-token>" -d '[{"url": "https://example.com/image1.jpg"}]' https://your-worker-url.com/api/classify
    • Classify an image using a file upload:

      curl -X POST -H "Content-Type: multipart/form-data" -H "Authorization: Bearer <your-jwt-token>" -F "file=@/path/to/image.jpg" https://your-worker-url.com/api/classify
    • Classify an image using a URL with the llama model for analysis:

      curl -X POST -H "Content-Type: application/json" -H "Authorization: Bearer <your-jwt-token>" -d '[{"url": "https://example.com/image1.jpg"}]' https://your-worker-url.com/api/classify/llama
    • Classify an image using a file upload with the gemma model for analysis:

      curl -X POST -H "Content-Type: multipart/form-data" -H "Authorization: Bearer <your-jwt-token>" -F "file=@/path/to/image.jpg" https://your-worker-url.com/api/classify/gemma

    Replace <your-jwt-token> with your actual JWT token and https://your-worker-url.com with the URL of your deployed Cloudflare Worker.

  4. The API will return a JSON response with the classification results and analysis (if applicable) for each image:

    {
    	"responses": [
    		{
    			"classification": [
    				{
    					"label": "dog",
    					"score": 0.9
    				},
    				{
    					"label": "animal",
    					"score": 0.8
    				}
    			],
    			"analysis": "The image contains a dog, which is a type of animal. The classification scores indicate a high confidence in the presence of a dog in the image."
    		},
    		{
    			"classification": [
    				{
    					"label": "cat",
    					"score": 0.95
    				},
    				{
    					"label": "animal",
    					"score": 0.85
    				}
    			],
    			"analysis": "The image depicts a cat, which belongs to the animal category. The high classification scores suggest a strong likelihood of a cat being present in the image."
    		}
    	]
    }

    If the model parameter is not provided or is set to a value other than llama or gemma, the analysis field will be absent in the response.

Limitations

  • The Microsoft Vision Model ResNet-50 is pretrained on a specific set of image categories. It may not perform well on images outside its training domain.
  • The model accepts only certain image formats, such as JPEG, PNG, and GIF. Other formats may not be supported.
  • The performance of the model may vary depending on the quality and resolution of the input images.

Contributing

Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.

License

This project is licensed under the MIT License.

image-classification-resnet-50's People

Contributors

llegomark avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.