Code Monkey home page Code Monkey logo

pechkin's Introduction

Pechkin

Pechkin is a modern, asynchronous, flexible and configurable Node.js library for handling file uploads (i.e. multipart/form-data requests), written in TypeScript. It's perfect for complex usecases requiring lots of flexibility with fields and multiple files mixed together.

Features

  • Fast, based on busboy.
  • No temporary files are created, files are not loaded in memory.
  • Asynchronous, Promise- and AsyncIterator-based. Fields and each file are available as Promises as soon as they're parsed.
  • Flexible: you don't need to provide any storage engines, file handlers, etc. Pechkin only provides the parsed data in form of streams and promises, and you can do whatever you want with it.
  • Highly configurable, with possibility to override some configuration options per-field (e.g. maxFileByteLength: 1MB for all files, but 5MB for file fieldname my_custom_video_file).
  • Expressive TypeScript typings.
  • Robust error handling: you can be sure that all errors have been caught, handled, and underlying resources (streams) were properly handled/closed.
  • Only 1 dependency (busboy).

Check for tips on migration from v1.x to v2.x.

Requirements

Installation

npm install pechkin

Examples / Usage

TL;DR

  • All fields in the FormData request should come before any files. Any fields submitted after the first file are lost.
  • parseFormData() returns a Promise that resolves when all fields are parsed, and the first file is encountered (or the request ended).
  • The promise contains a populated fields object, and a files AsyncIterator/AsyncIterable.
  • Asynchronously iterate over the files using the for-await-of loop or using the next() method.
  • File streams should always be consumed (e.g. by the code inside for-await-of loop, or before the subsequent next() call). Otherwise the request parsing will stall.

FOR FULL WORKING EXAMPLES, SEE THE examples/ FOLDER

Importing

The package provides both CommonJS and ESM modules.

// ESM: index.mjs

import * as pechkin from 'pechkin';
// or
import { parseFormData } from 'pechkin';

// CommonJS: index.cjs

const pechkin = require('pechkin');
// or
const { parseFormData } = require('pechkin');

Files are processed sequentially.

// Full working example: `examples/fs.js`

http.createServer(async (req, res) => {
  const { fields, files } = await pechkin.parseFormData(req, {
    maxTotalFileFieldCount: Infinity,
    maxFileCountPerField: Infinity,
    maxTotalFileCount: Infinity
  });

  const results = [];

  for await (const { filename: originalFilename, stream, ...file } of files) {
    const newFilename = `${Math.round(Math.random() * 1000)}-${originalFilename}`;
    const dest = path.join(os.tmpdir(), newFilename);

    // Pipe the stream to a file
    // The stream will start to be consumed after the current block of code
    // finishes executing...
    stream.pipe(fs.createWriteStream(dest));
    
    // ...which allows us to set up event handlers for the stream and wrap
    // the whole thing in a Promise, so that we can get the stream's length.
    const length = await new Promise((resolve, reject) => {
      // Since Node v15.0.0, you can use `stream.finished()`, instead of
      // manually setting up event listeners and resolving/rejecting inside
      // them.
      // https://nodejs.org/api/stream.html#streamfinishedstream-options
      stream
        // `stream` is an instance of Transform, which is a Duplex stream,
        // which means you can listen to both 'end' (Readable side)
        // and 'finish' (Writable side) events.
        .on('end', () => resolve(stream.bytesWritten))
        .on('finish', () => resolve(stream.bytesWritten))
        // You can either reject the Promise and handle the Promise rejection
        // using .catch() or await + try-catch block, or you can directly
        // somehow handle the error in the 'error' event handler.
        .on('error', reject);
    })

    results.push({ ...file, dest, originalFilename, newFilename, length});
  }

  console.log(results);

  /*
  OUTPUT:

  {
    "fields": { [fieldname: string]: string },
    "files": [
      {
        "field": string,
        "filename": string,
        "mimeType": string,
        "dest": string,
        "originalFilename": string,
        "newFilename": string,
        "length": number
      },
      ...
    ],
  }
  */
});

Pechkin doesn't provide an Express middleware out-of-the-box, but it's very easy to create one yourself.

// FULL WORKING EXAMPLE: `examples/express.js`

// ... Boilerplate code ...

function pechkinFileUpload (config, fileFieldConfigOverride, busboyConfig) {
  return async (req, res, next) => {
    try {
      const { fields, files } = await parseFormData(req, config, fileFieldConfigOverride, busboyConfig);

      req.body = fields;
      req.files = files;

      return next();
    } catch (err) {
      return next(err);
    }
  }
}

app.post(
  '/',
  pechkinFileUpload(),
  async (req, res) => {
    const files = [];

    for await (const { stream, field, filename } of req.files) {
      // Process files however you see fit...
      // Here, streams are simply skipped
      stream.resume();

      files.push({ field, filename });
    }

    return res.json({ fields: req.body, files });
  }
);

// ... Boilerplate code ...

API

Pechkin exposes only 1 function:

parseFormData()

Type:

function parseFormData(
  request:                  IncomingMessage,
  config?:                  Pechkin.Config
  fileFieldConfigOverride?: Pechkin.FileFieldConfigOverride,
  busboyConfig?:            Pechkin.BusboyConfig,
): Promise<{
  fields: Pechkin.Fields,
  files:  Pechkin.Files,
}>

Given a request (of type http.IncomingMessage, e.g. the request object in http.createServer((req, ...) => { ... })), return a Promise, containing:

  • All parsed fields,
  • An AsyncIterableIterator of files, which you can use both as an iterator (calling await files.next()), or as an iterable (for await (const file of files) { ... }).

๐Ÿšง Warning:

fields are parsed only until the first file โ€“ when constructing a FormData request, you should always put all fields before any files.

Parameter: config

All fields are optional. Numerical limits are INCLUSIVE.

Key Type Default Description
maxTotalHeaderPairs number 2000 From Busboy: the max number of header key-value pairs to parse.
Default is same as node's http module.
maxTotalPartCount number 110 (100 fields + 10 files) The max number of parts (fields + files).
maxFieldKeyByteLength number 100 bytes The max byte length (each char is 1 byte) of a field name.
maxFieldValueByteLength number 1024 * 1024 bytes, 1 MB The max byte length of a field value.
maxTotalFieldCount number 100 The max total number of all non-file fields.
maxTotalFileFieldCount number 1 The max total number of all file fields.
Each file field may contain more than 1 file, see config.maxFileCountPerField.

To use if you have more than 1 <input type="file">.
maxTotalFileCount number 10 The max total number of all files (summed across all fields).
maxFileByteLength number 50 * 1024 * 1024 (50 MB) The max byte length of a file
maxFileCountPerField number 1 The max number of files allowed for each file field.

To use with <input type="file" multiple>.
abortOnFileByteLengthLimit boolean true If a file goes over the maxFileByteLength limit, whether to:

- Throw an error (and do cleanup, i.e. abort the entire operation), or
- To truncate the file.

Parameter: fileFieldConfigOverride

For each field, you can set the values of:

  • maxFileCountPerField
  • maxFileByteLength
  • abortOnFileByteLengthLimit

which will override the values in the general config (including the defaults). The values for numerical limits can be both smaller and larger than the ones in the general config.

Example:

Let's say you configured parseFormData() the following way:

await parseFormData(
  request,
  {
    maxFileByteLength: 15, // 10 bytes
  },
  {
    exampleOverrideFile: {
      maxFileByteLength: 10, // 5 bytes
      abortOnFileByteLengthLimit: false,
    }
  },
  ...
)

Now, if you send a FormData request with following structure (represented as JSON, this is NOT a valid FormData request):

{
  "normalFile": {
    "type": "file",
    /*
    byte length (15) === config.maxFileByteLength,
    no error thrown,
    no truncation
    */
    "content": "15 bytes, innit?"
  },
  "examplePriorityFile": {
    "type": "file",
    /*
    byte length (10) > fileFieldConfigOverride["exampleOverrideFile"],
    fileFieldConfigOverride["exampleOverrideFile"].abortOnFileByteLengthLimit === false,
    FILE TRUNCATED TO 10 BYTES: "will be tr"
    */
    "content": "will be truncated" 
  },
  "file2": {
    "type": "file",
    /*
    byte lenght (26) > config.maxFileByteLength,
    config.abortOnFileByteLengthLimit === true (by default, as no custom value and no override was provided),
    ERROR THROWN:

    Exceeded file byte length limit ("maxFileByteLength").
    Corresponding Busboy configuration option: Busboy.Limits["files"].
    Field: "file2".
    Configuration info: 26
    */
    "content": "26 bytes, so will throw :("
  }
}

Parameter: busboyConfig

Type: Pechkin.BusboyConfig, which equals to Busboy.Config (from busboy package) without the limits property. Limits passed to busboy are ignored, and instead the limits are set by pechkin's config & fileFieldConfigOverride parameters are used.

Return value: Files AsyncIterator / AsyncIterable

Type:

type Files = {
  next: () => Promise<{
    done: boolean
    value: Pechkin.File
  }>,
  return: () => Promise<void>,
  throw: (error: Error) => Promise<void>,
  [Symbol.asyncIterator]: () => this
}

Files is both an AsyncIterator and an AsyncIterable, so you can use it both as an iterator (calling await files.next()) and as an iterable (for await (const file of files) { ... }). It is recommended to use it only as an iterable in a for-await-of loop, as it's much easier and less error-prone to use.

โ—๏ธ Very important note on iteration:

The file.stream should always be consumed, otherwise the request parsing will hang, and you might never get access to the next file. If you don't care about a particular file, you can simply do file.stream.resume(), but the stream should always be consumed.

(Internal) Error handling inside `Pechkin::FileIterator``

This section is for those who want to know how errors are handled internally. This is not necessary to use pechkin.

  • If an error occurs inside next() (for example, a file exceeded its maxFileByteLength limit), a cleanup function is called, which unpipes the request from the parser (busboy), the iterator is stopped, and the error is thrown.

  • If an error occurs inside the body of the for-await-of loop, return() is called, a cleanup function is called, and the iterator is stopped.

  • If an error occurs anywhere else inside Pechkin, throw() method is called, which either:

    • Rejects the currently-awaited next() call,
    • Or, if there is no next() call currently awaited, sets the next next() call to reject with the error.

    Apart from that, the usual cleanup function is called, and the iterator is stopped.

Type: File

{
  filename: string;
  encoding: string;
  mimeType: string;
  field: string;
  stream: ByteLengthTruncateStream; // See below: "Type: ByteLengthTruncateStream"
}
  • filename: The client-provided filename of the file.
  • encoding: The encoding of the file. List of encodings supported by Node.js.
  • mimeType: The MIME type of the file. If the MIME type is crucial for your application, you should not trust the client-provided mimeType value โ€“ the client can easily lie about it (e.g. send an .exe file with mimeType: "image/png"). Instead, you should use a library like file-type.
  • field: The name of the field the file was sent in.
  • stream: The file Readable stream. The stream should always be consumed, otherwise the request parsing will hang, and you might never get access to the next file. If you don't care about a particular file, you can simply do file.stream.resume(), but the stream should always be consumed.

Type: ByteLengthTruncateStream

A Transform stream, which does the following to source streams piped into it:

  • Does nothing, i.e. acts as a PassThrough stream, as long as the source stream hasn't reached maxFileByteLength limit bytes.
  • As soon as the source stream reaches maxFileByteLength limit bytes:
    • Sets the truncated property to true
    • Throws if abortOnFileByteLimit = true
    • Truncates the file if abortOnFileByteLimit = false
Transform & {
  bytesRead: number;
  bytesWritten: number;
  truncated: boolean;
}
  • bytesRead: The number of bytes read from the source stream.
  • bytesWritten: The number of bytes written to the destination stream.
  • truncated: Whether the file was truncated or not. Truncation only happens with abortOnFileByteLimit = false. bytesRead - bytesWritten is the number of bytes truncated, and is larger than 0 only if truncated = true, and 0 if truncated = false.

All of the above properties are updated in real time, as the stream is consumed. This means that you have to wait until the stream is fully consumed (i.e. 'finish'/'end' events are emitted, after e.g. an upload to file system or S3) to get the final values of bytesRead, bytesWritten and truncated.

pechkin's People

Contributors

rafasofizada avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pechkin's Issues

Finish Event don't fired

Hi!

I dont understand, why here the finish event won't fire? pathExists comes from npm fs-extra
Any hints?

for await (const { stream, filename } of files) {
const filePath = /upload/${filename}

if (await pathExists(filePath)) {
    throw Error('file already exist. check protocol for errors')
  } else {
    stream.pipe(createWriteStream(filePath))

    const length = await new Promise((resolve, reject) => {
      stream
         .on('end', () => {
           console.log('read end')
           resolve(stream.bytesWritten)
         })
         .on('finish', () => {
           console.log('write finish')
           resolve(stream.bytesWritten)
         })
         .on('error', reject)
     })

     console.log(length)
  }

}

maxByteLength Error Handling

First off, thank you @rafasofizada for writing pechkin! I was initially using formidable to handle file uploads, but was growing frustrated with async handling and wasn't able to stream data from the request without using a temporary file. Also note that I'm new to Node, and came from PHP, so I'm mostly fumbling my way around.

My goal with using pechkin was as follows:

  1. Using Express, receive one or more files through a multipart/form-data request.
  2. Stream the request to AWS S3 as multi-part using @aws-sdk/lib-storage.
  3. If the stream fails in any way, but certainly if the file is above a size limit I have set through maxFileByteLength, abort the S3 upload.

The problem I'm running into is with using abortOnFileByteLengthLimit: true. If the file limit is reached, the S3 upload will fail (instead of just being truncated but successfully sent), but the error thrown from FileIterator is not caught and seemingly cannot be caught, so my node process crashes:

/[source-path]/node_modules/pechkin/dist/cjs/FileIterator.js:96
                throw new error_1.FieldLimitError("maxFileByteLength", field, maxFileByteLength);
                      ^

FieldLimitError: Exceeded file byte length limit ("maxFileByteLength").
Configuration info: 100
Field: "files[0]"
    at /[source-path]/node_modules/pechkin/dist/cjs/FileIterator.js:97:23
    at processTicksAndRejections (node:internal/process/task_queues:96:5) {
  configurationInfo: 100,
  limitType: 'maxFileByteLength',
  field: 'files[0]'
}
[nodemon] app crashed - waiting for file changes before starting...

Ideally, at least to a layman such as myself, it would be preferable for the Transform stream to throw an error when the size limit is reached. Just as a test I tried this, within the _transform function within ByteLengthTruncateStream, and it seemed to function as expected. I'm able to catch the error and continue to iterate through the files.

if (this.readBytes + chunkBuffer.byteLength > this.maxByteLength) {
    return callback(new Error(`ByteLengthTruncateStream: maxByteLength exceeded: ${this.maxByteLength}`));
    ...

By looking at the examples and the tests, it just seems that throwing an error to abort a stream was not a priority. As it stands, it seems the only way I can use this package with a file size limit is to set abortOnFileByteLengthLimit: false, check byteLength after the S3 upload completes, and if truncated: true, then manually delete the file that was just uploaded. I'll also probably require a Content-Length header and validate the size ahead of time, although that only protects when the request comes from our app - it seems a separate request can be made with a fake content length, and there's no built-in validation in Node for multipart requests.

If you have any thoughts or suggestions, I'm all ears!

Multiple file uploads

Does this library support multiple file uploads? From what I see in the docs, there exsists an maxFileCountPerField option so it certainly is. Not sure if this is my implementation problem (almost certainly is) or a bug in the lib (busboy behaves the same), but I can't upload more than one file. Only one gets saved to disk, while (probably?) both get parsed.

Here's the config:

pechkinFileUpload({
		maxFileByteLength: 50 * 1024 * 1024,
		maxFileCountPerField: 15,
		maxTotalFieldCount: 0,
	}

And the code that handles the files

const files = [];
for await (const { stream, field, filename, encoding, mimeType } of req.files) {

	// ... irrevelant code here ...

	files.push({
		'submittedName': filename,
		// ... some more non relevant stuff... 
	})

	stream.pipe(fs.createWriteStream(savePath));
}

console.log(files);  // This array always contains only ONE file!

files.forEach(async file => {
	// save files to db
});

Am I calling everything too early? Should I use a promise? Would be great if there was an example if it's a problem in my implementation, so people won't have to ask in the future ๐Ÿ‘

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.