Code Monkey home page Code Monkey logo

malwaremultiscan's Introduction

MalwareMultiScan

Tests API UI Scanners

Self-hosted VirusTotal / OPSWAT MetaDefender wannabe API for scanning URLs and files by multiple antivirus solutions.

MalwareMultiScan UI

IMPORTANT: version 1.5 introduces breaking changes in containers configuration and docker-compose.yaml layout. Please see releases page and changelog of docker-compose.yaml and README.md for the additional details.

Introduction

I faced a need to scan user-uploaded files in one of my work projects in an automated mode to ensure they don't contain any malware. Using VirusTotal was not an option because of a) legal restrictions and data residency limitations b) scanning by hash-sums would not be sufficient because the majority of files are generated / modified by users.

After googling, I stumbled upon a fantastic maliceio/malice project. Unfortunately, it looks abandoned, and most plugins do not work for the moment. In addition to that, I had an intention to use the .NET stack to align with the internal infrastructure.

In the end, it's nothing but the set of Docker containers running the agent. That agent downloads the remote file to the temp folder, then launches the vendor command-line scanning utility with proper arguments, and parses the output with a regular expression to extract a detected malware name.

Installation & Usage

IMPORTANT: MalwareMultiScan is not intended as a publicly-facing API / UI. It has (intentionally) no authorization, authentication, rate-limiting, or logging. Therefore, it should be used only as an internal / private API or behind the restrictive API gateway.

Whole solution can be started with docker-compose up executed in a root folder of repository.

It can be also deployed to the Docker Swarm cluster by using the command docker stack deploy malware-multi-scan --compose-file docker-compose.yaml.

After the start the Demo Web UI will become available under http://localhost:8888.

See components chapter below and the docker-compose.yaml file.

Configuration

Configuration of API and Scanners is performed by passing the environment variables. Descriptions and default values are provided below.

MalwareMultiScan.Api

  • MONGO_ADDRESS=mongodb://localhost:27017 - MongoDB connection string.

  • MONGO_DATABASE=MalwareMultiScan - MongoDB collection name.

  • REDIS_ADDRESS=localhost:6379 - Redis address for the distributed task queue.

  • CONSUL_ADDRESS=http://localhost:8500 - Consul address for the service registration.

  • FILE_SIZE_LIMIT=52428800 - Maximum size of a file that can be handled for the file scanning. The size of the URL content is not verified. Set to 0 to disable the validation.

MalwareMultiScan.Scanner

  • BACKEND_ID=dummy - Id of a backend.

  • REDIS_ADDRESS=localhost:6379 - Redis address for the distributed task queue.

  • CONSUL_ADDRESS=http://localhost:8500 - Consul address for the service registration.

  • MAX_SCANNING_TIME=60 - Scan time limit. It is used not just for actual scanning but also for getting the file.

  • WORKER_COUNT=4 - Number of workers for parallel scanning.

MalwareMultiScan.Ui

  • API_URL=http://localhost:5000 - Absolute URL incl. port number for the running instance of MalwareMultiScan.Api.

API Endpoints

  • POST /api/queue/url with a url parameter passed via the form data.. Returns 201 Accepted response with a ScanResult or 400 Bad Request error.

  • POST /api/queue/file with a file parameter passed via the form data. Returns 201 Accepted response with a ScanResult or 400 Bad Request error.

  • GET /api/results/{result-id} where {result-id} corresponds to the id value of a ScanResult. Returns 200 OK response with a ScanResult or 404 Not Found error.

Callback URL

Both /api/queue/url and /api/queue/file also accept an optional callbackUrl parameter with the http(s) URL in it. This URL will be requested by the POST method with JSON serialized ScanResultMessage in a body on every update from scan backends. Query string will contain id parameter that corresponds to the id of the scan result and backend parameter with the id of backend which completed the scan.

I.e. when you define callbackUrl=http://localhost:1234/scan-results, the POST request will be made to http://localhost:1234/scan-results?id=123&backend=dummy with a body

{
  "Status": 1,
  "Duration": 5,
  "Threats": ["Malware.Dummy.Result"]
}

Supported Scan Engines

Name Dockerfile Enabled Comments
ClamAV Clamav.Dockerfile
Comodo Comodo.Dockerfile
DrWeb DrWeb.Dockerfile Pass license key to the DRWEB_KEY build arg.
Dummy Dockerfile Scan backend made for testing. Returns Malware.Dummy.Result threat for every scan after 5 seconds.
KES KES.Dockerfile Pass license key to the KES_KEY build arg. KES 11 does not work in Docker.
McAfee McAfee.Dockerfile
Sophos Sophos.Dockerfile
Defender WindowsDefender.Dockerfile

More scan backends can be added in the future. Some of the popular ones do not have command line scanning utility, Linux version, or don't start in Docker container. Feel free to raise an issue if you know any in addition to the list above.

Components

Workflow

  1. On startup all Scanners register themselves in Consul with a service name equal to scanner and the BackendId metadata field equal to the value of BACKEND_ID environment variable. They also register a TTL check and listen for Hangfire background job in a queue named under the BackendId metadata field.

  2. Third-party client triggers /api/queue/url or /api/queue/file of the MalwareMultiScan.Api.

  3. MalwareMultiScan.Api sends a query to Consul and receives the list of alive scan backends with the service name scanner.

  4. MalwareMultiScan.Api schedules a Hangfire background job in a queue named under the BackendId metadata field.

  5. Scanners picks up a job from queue, starts the scan and sends result back to the default queue of Hangfire.

  6. MalwareMultiScan.Api picks a job from the default` queue of Hangfire and updates the state of the scan.

  7. If callback URL was specified during the step #2, MalwareMultiScan.Api triggers a HTTP POST request to the specified URL. See Callback URL for details.

Prerequisites

  • MongoDB of version 3.x or above. Used for storing scan results and files in GridFS. The communication is happening through the official C#/.NET driver.

  • Redis of version 5.x or above. Used for tasks queueing. The communication is happening through the Hangfire library.

  • Consul of version 1.8.x or above. Used for service registration of scan backends.

  • Docker and docker-compose running under Windows (in Linux containers mode), Linux, or OSX. Docker Compose is needed only for test / local deployments.

  • Optional: DockerSwarm / Kubernetes cluster for scaling up the scanning capacities.

Parts

  • MalwareMultiScan.Api - Simple ASP.NET Core WebApi for queueing files & urls for the scan and returning the result. Also acts as a receiver of scan results from the scanning backend nodes. See Dockerfile.

  • MalwareMultiScan.Backends - Scan backends logic. Includes Dockerfiles and implementation classes for third-party vendor scan backends.

  • MalwareMultiScan.Shared - Shared components.

  • MalwareMultiScan.Scanner - .NET Core Worker service subscribes to messages corresponding to the backend id, then fires up scanning command-line utility, and parses the output. See Dockerfile. The image of MalwareMultiScan.Scanner acts as a base image for the rest of the scan backends. Check Dockerfiles from the table above for details.

  • MalwareMultiScan.Ui - Nuxt.js TypeScript SPA for demoing the API capabilities. See Dockerfile.

Plans

See issues for the list of planned features, bug-fixes, and improvements.

malwaremultiscan's People

Contributors

dependabot[bot] avatar volodymyrsmirnov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

malwaremultiscan's Issues

Add background processing to scan backends

Use Hangfire and Hangfire.Memory to queue the scanning in the background. That would empower single scanning node to scan multiple files in the background.

This should be an optional feature with a configurable degree of parallelism. Requires testing on all backends to confirm that simultaneous scan is possible.

Sophos Error

Any idea error for Sophos scanner?

sophos-scanner_1            | fail: MalwareMultiScan.Scanner.Services.ScanBackgroundJob[0]
sophos-scanner_1            |       Scanning failed with exception
sophos-scanner_1            | System.ComponentModel.Win32Exception (2): No such file or directory
sophos-scanner_1            |    at System.Diagnostics.Process.ForkAndExecProcess(String filename, String[] argv, String[] envp, String cwd, Boolean redirectStdin, Boolean redirectStdout, Boolean redirectStderr, Boolean setCredentials, UInt32 userId, UInt32 groupId, UInt32[] groups, Int32& stdinFd, Int32& stdoutFd, Int32& stderrFd, Boolean usesTerminal, Boolean throwOnNoExec)
sophos-scanner_1            |    at System.Diagnostics.Process.StartCore(ProcessStartInfo startInfo)
sophos-scanner_1            |    at System.Diagnostics.Process.Start()
sophos-scanner_1            |    at MalwareMultiScan.Backends.Services.Implementations.ProcessRunner.RunTillCompletion(String path, String arguments, CancellationToken cancellationToken, String& standardOutput, String& standardError) in /src/MalwareMultiScan.Backends/Services/ProcessRunner.cs:line 56
sophos-scanner_1            |    at MalwareMultiScan.Backends.Backends.Abstracts.AbstractLocalProcessScanBackend.ScanAsync(String path, CancellationToken cancellationToken) in /src/MalwareMultiScan.Backends/Backends/Abstracts/AbstractLocalProcessScanBackend.cs:line 51
sophos-scanner_1            |    at MalwareMultiScan.Backends.Backends.Abstracts.AbstractScanBackend.ScanAsync(Stream stream, CancellationToken cancellationToken) in /src/MalwareMultiScan.Backends/Backends/Abstracts/AbstractScanBackend.cs:line 43
sophos-scanner_1            |    at MalwareMultiScan.Backends.Backends.Abstracts.AbstractScanBackend.ScanAsync(Uri uri, CancellationToken cancellationToken) in /src/MalwareMultiScan.Backends/Backends/Abstracts/AbstractScanBackend.cs:line 28
sophos-scanner_1            |    at MalwareMultiScan.Backends.Backends.Abstracts.AbstractScanBackend.ScanAsync(Uri uri, CancellationToken cancellationToken) in /src/MalwareMultiScan.Backends/Backends/Abstracts/AbstractScanBackend.cs:line 28
sophos-scanner_1            |    at MalwareMultiScan.Scanner.Services.ScanBackgroundJob.Process(ScanQueueMessage message) in /src/MalwareMultiScan.Scanner/Services/ScanBackgroundJob.cs:line 65
sophos-scanner_1            | info: MalwareMultiScan.Scanner.Services.ScanBackgroundJob[0]
sophos-scanner_1            |       Sending scan results with status Failed
api_1                       | warn: Hangfire.AutomaticRetryAttribute[0]
api_1                       |       Failed to process the job '2b1fe90a-7561-4d0b-b847-c416591b3317': an exception occured. Job was automatically deleted because the retry attempt count exceeded 0.
api_1                       | System.ArgumentNullException: Value cannot be null. (Parameter 'value')
api_1                       |    at System.String.Join(String separator, String[] value)
api_1                       |    at MalwareMultiScan.Api.Services.ScanResultJob.Report(String resultId, String backendId, ScanResultMessage result) in /src/MalwareMultiScan.Api/Services/ScanResultJob.cs:line 48
api_1                       |    at System.Runtime.CompilerServices.TaskAwaiter.GetResult()

Large scan files

What changes can I make to increase the file size to scan. I am uploading 300mb file to scan and scanner stops working

Create API for MalwareMultiScan

Create API to accept incoming file or URL scan request and forward them to the backends taken from the YAML file.

Storage backend: MongoDB

Add job to cleanup old files from the GridFS

Right now after the job has been completed, the file would stay in the GridFS forever slowly turning the system into the museum of malware and customer data. This has to be changed by adding the recurring job for cleaning files older than X days.

File creation date can be stored in GridFS metadata (see GridFSFileInfo).

Add scan backends health monitoring

At the moment, if scan backend goes offline, there's no way to find this out. Scanning jobs will just keep spinning indefinitely. System requires an ability to track active scan backends and also have API endpoint to return this information to clients.

`docker-compose up` not working - YAML issue?

This is on a fresh Debian 10 x64 virtual machine, after installing the Docker repositories and docker-compose:

$ docker-compose up
ERROR: Version in "./docker-compose.yaml" is unsupported. You might be seeing this error because you're using the wrong Compose file version. Either specify a supported version (e.g "2.2" or "3.3") and place your service definitions under the `services` key, or omit the `version` key and place your service definitions at the root of the file to use version 1.
For more on the Compose file format versions, see https://docs.docker.com/compose/compose-file/

$ head -n 1 docker-compose.yaml 
version: "3.8"
$ docker version
Client: Docker Engine - Community
 Version:           19.03.14
 API version:       1.40
 Go version:        go1.13.15
 Git commit:        5eb3275d40
 Built:             Tue Dec  1 19:20:22 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.14
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       5eb3275d40
  Built:            Tue Dec  1 19:18:50 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.3.9
  GitCommit:        ea765aba0d05254012b0b9e595e995c09186427f
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

I'm sure I'm missing something, but maybe this isn't a commonly-tested scenario?

Write README.md

Things to mention:

  1. Project background.
  2. Reference to malice.
  3. List of scanning backends.
  4. Deployment guide.
  5. Link for demo environment.

More\Other Engines Support?

Hey @mindcollapse,
Seriously cool project.

Do you think it would be possible to add more support for other sandboxes as well like Triage, ReversingLabs, Inquest, JoeSandbox or AnyRun? What is basically required?

Thanks again,
D

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.