Code Monkey home page Code Monkey logo

httparser's Introduction

httparser

Go codecov

高性能http 1.1解析器,为你的异步io库插上解析的翅膀[从零实现]

出发点

本来想基于异步io库写些好玩的代码,发现没有适用于这些库的http解析库,索性就自己写个,弥补golang生态一小片空白领域。

特性

  • url解析
  • request or response header field解析
  • request or response header value解析
  • Content-Length数据包解析
  • chunked数据包解析

parser request

	var data = []byte(
		"POST /joyent/http-parser HTTP/1.1\r\n" +
			"Host: github.com\r\n" +
			"DNT: 1\r\n" +
			"Accept-Encoding: gzip, deflate, sdch\r\n" +
			"Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4\r\n" +
			"User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) " +
			"AppleWebKit/537.36 (KHTML, like Gecko) " +
			"Chrome/39.0.2171.65 Safari/537.36\r\n" +
			"Accept: text/html,application/xhtml+xml,application/xml;q=0.9," +
			"image/webp,*/*;q=0.8\r\n" +
			"Referer: https://github.com/joyent/http-parser\r\n" +
			"Connection: keep-alive\r\n" +
			"Transfer-Encoding: chunked\r\n" +
			"Cache-Control: max-age=0\r\n\r\nb\r\nhello world\r\n0\r\n\r\n")

	var setting = httparser.Setting{
		MessageBegin: func(*httparser.Parser) {
			//解析器开始工作
			fmt.Printf("begin\n")
		},
		URL: func(_ *httparser.Parser, buf []byte) {
			//url数据
			fmt.Printf("url->%s\n", buf)
		},
		Status: func(*httparser.Parser, []byte) {
			// 响应包才需要用到
		},
		HeaderField: func(_ *httparser.Parser, buf []byte) {
			// http header field
			fmt.Printf("header field:%s\n", buf)
		},
		HeaderValue: func(_ *httparser.Parser, buf []byte) {
			// http header value
			fmt.Printf("header value:%s\n", buf)
		},
		HeadersComplete: func(_ *httparser.Parser) {
			// http header解析结束
			fmt.Printf("header complete\n")
		},
		Body: func(_ *httparser.Parser, buf []byte) {
			fmt.Printf("%s", buf)
			// Content-Length 或者chunked数据包
		},
		MessageComplete: func(_ *httparser.Parser) {
			// 消息解析结束
			fmt.Printf("\n")
		},
	}

	p := httparser.New(httparser.REQUEST)
	success, err := p.Execute(&setting, data)

	fmt.Printf("success:%d, err:%v\n", success, err)

response

response

request or response

如果你不确定数据包是请求还是响应,可看下面的例子
request or response

编译

生成 unhex表和tokens表

如果需要修改这两个表,可以到_cmd目录下面修改生成代码的代码

make gen

编译example

make example

运行示例

make example.run

return value

  • err != nil 错误
  • sucess == len(data) 所有数据成功解析
  • sucess < len(data) 只解析部分数据,未解析的数据需再送一次

吞吐量

httparser's People

Contributors

guonaihong avatar wangxin008 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

httparser's Issues

Setting里面回调函数形参

目前设计,如下。主要考虑到闭包可以方便捕获变量。

type Setting struct {
	// 解析开始
	MessageBegin func()
	// url 回调函数, 只有在request包才会回调
	URL func([]byte)
	// 状态短语
	Status func([]byte)
	// http field 回调函数
	HeaderField func([]byte)
	// http value 回调函数
	HeaderValue func([]byte)
	// http 解析完成之后的回调函数
	HeadersComplete func()
	// body的回调函数
	Body func([]byte)
	// 所有消息成功解析
	MessageComplete func()
}

但是,要写一个http benchmark。这里处理的业务是一样的,只处理Response数据,只要一个全局的Setting就够了,相比之下,现在设计每个Parser,都要binding一个Setting,在内存上来说有些浪费,所以,设计修改如下:

type Setting struct {
	// 解析开始
	MessageBegin func(httparser.Parser)
	// url 回调函数, 只有在request包才会回调
	URL func(httparser.Parser, []byte)
	// 状态短语
	Status func(httparser.Parser, []byte)
	// http field 回调函数
	HeaderField func(httparser.Parser, []byte)
	// http value 回调函数
	HeaderValue func(httparser.Parser, []byte)
	// http 解析完成之后的回调函数
	HeadersComplete func(httparser.Parser)
	// body的回调函数
	Body func(httparser.Parser, []byte)
	// 所有消息成功解析
	MessageComplete func(httparser.Parser)
}

一个完整包,从http每行数据中间位置拆成子包异常

不只是1.5包的问题,0.5包也需要修复下,比如http一行数据里,分两次,第一次header的field name解析完了,但是value还没解析出来,第二次解析剩余数据,也会出错,见代码。tcp “粘包” 是可能在stream的任意位置的,所以建议再加一些覆盖度更高的test

package main

import (
	"fmt"
	"github.com/antlabs/httparser"
	"time"
)

var data = []byte(
	"POST /joyent/http-parser HTTP/1.1\r\n" +
		"Host: github.com\r\n" +
		"DNT: 1\r\n" +
		"Accept-Encoding: gzip, deflate, sdch\r\n" +
		"Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4\r\n" +
		"User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) " +
		"AppleWebKit/537.36 (KHTML, like Gecko) " +
		"Chrome/39.0.2171.65 Safari/537.36\r\n" +
		"Accept: text/html,application/xhtml+xml,application/xml;q=0.9," +
		"image/webp,*/*;q=0.8\r\n" +
		"Referer: https://github.com/joyent/http-parser\r\n" +
		"Connection: keep-alive\r\n" +
		"Transfer-Encoding: chunked\r\n" +
		"Cache-Control: max-age=0\r\n\r\nb\r\nhello world\r\n0\r\n")

var kBytes = int64(8) << 30

var setting = httparser.Setting{
	MessageBegin: func() {
		fmt.Println("---- begin")
	},
	URL: func(buf []byte) {
	},
	Status: func([]byte) {
		// 响应包才需要用到
	},
	HeaderField: func(buf []byte) {
		// fmt.Println("HeaderField:", string(buf))
	},
	HeaderValue: func(buf []byte) {
		// fmt.Println("HeaderValue:", string(buf))
	},
	HeadersComplete: func() {

	},
	Body: func(buf []byte) {
	},
	MessageComplete: func() {
		fmt.Println("---- complete")
	},
	// MessageEnd: func() {
	// },
}

func bench(iterCount int64, silent bool) {
	var start time.Time
	if !silent {
		start = time.Now()
	}

	p := httparser.New(httparser.REQUEST)
	fmt.Printf("req_len=%d\n", len(data))
	// 一个POST 518,一共两个POST,第一次解析600字节,第二次解析剩余的
	data1, data2 := data[:300], data[300:]
	sucess, err := p.Execute(&setting, data1)
	if err != nil {
		panic(err.Error())
	}
	if sucess < len(data1) {
		data2 = append(data1[sucess:], data2...)
		fmt.Println("----------------------xxx")
		fmt.Println(string(data1[:sucess]))
		fmt.Println("----------------------")
		fmt.Println(string(data2))
		fmt.Println("----------------------")
		fmt.Printf("111 sucess: %d, data2: %d", sucess, len(data2))
	}
	// if sucess != len(data1) {
	// 	panic(fmt.Sprintf("sucess 111 length size:%d", sucess))
	// }

	sucess, err = p.Execute(&setting, data2)
	if err != nil {
		panic(err.Error())
	}
	if sucess != len(data2) {
		panic(fmt.Sprintf("sucess 222 length size:%d", sucess))
	}

	p.Reset()

	if !silent {
		end := time.Now()

		fmt.Printf("Benchmark result:\n")

		elapsed := end.Sub(start) / time.Second

		total := iterCount * int64(len(data))
		bw := float64(total) / float64(elapsed)

		fmt.Printf("%.2f mb | %.2f mb/s | %.2f req/sec | %.2f s\n",
			float64(total)/(1024*1024),
			bw/(1024*1024),
			float64(iterCount)/float64(elapsed),
			float64(elapsed))

	}
}

func main() {
	// iterations := kBytes / int64(len(data))
	// bench(iterations, false)
	bench(1, false)
}

httpparser + nbio 实现http server基础功能

如果只是为了轻便的网络层,libev的API形式没什么必要,或者如果你喜欢,可以简单在nbio基础上 wrap 一层
我写了个简单的 nbio + httpparser 的例子,httpparser 的 bug 我先忽略了,默认 httpparser.Execute 每次能解析成功一个请求,详见注释:

server.go

package main

import (
	"fmt"
	"log"
	"time"

	"github.com/antlabs/httparser"
	"github.com/lesismal/nbio"
)

var responseFormat = "HTTP/1.1 200 OK\r\n" +
	"Date: Tue, 02 Feb 2021 10:58:43 GMT\r\n" +
	"Content-Length: %v\r\n" +
	"Content-Type: text/plain; charset=utf-8\r\n" +
	"Connection: close\r\n\n%v"

// Request .
type Request struct {
	Headers map[string]string
	Body    []byte
}

// Session .
type Session struct {
	Parser    *httparser.Parser
	Setting   *httparser.Setting
	Request   *Request
	completed bool
}

// Next .
func (session *Session) Next() (*Request, bool) {
	if session.completed {
		return session.Request, true
	}
	return nil, false
}

func main() {
	g, err := nbio.NewGopher(nbio.Config{
		Network: "tcp",
		Addrs:   []string{":8080"},
	})
	if err != nil {
		log.Printf("nbio.New failed: %v\n", err)
		return
	}

	g.OnOpen(func(c *nbio.Conn) {
		c.SetReadDeadline(time.Now().Add(time.Second * 120))

		parser := httparser.New(httparser.REQUEST)
		setting := &httparser.Setting{
			MessageBegin: func() {
				session := c.Session().(*Session)
				session.Request = &Request{}
				session.completed = false
			},
			Body: func(buf []byte) {
				session := c.Session().(*Session)
				session.Request.Body = append(session.Request.Body, buf...)
			},
			MessageComplete: func() {
				session := c.Session().(*Session)
				session.completed = true
			},
		}
		c.SetSession(&Session{Parser: parser, Setting: setting})

		log.Println("+ connected:", c.RemoteAddr().String(), time.Now().Format("15:04:05.000"))
	})

	g.OnClose(func(c *nbio.Conn, err error) {
		log.Println("- disconnected:", c.RemoteAddr().String(), time.Now().Format("15:04:05.000"), err)
	})

	g.OnData(func(c *nbio.Conn, data []byte) {
		c.SetReadDeadline(time.Now().Add(time.Second * 120))

		session := c.Session().(*Session)

		// 这里忽略了 httpparser 解析的bug,先默认认为解析成功
		_, err := session.Parser.Execute(session.Setting, data)
		if err != nil {
			log.Printf("parse failed: %v", err)
			c.Close()
			return
		}

		c.SetWriteDeadline(time.Now().Add(time.Second * 3))
		response := append([]byte(fmt.Sprintf(responseFormat, len(session.Request.Body), string(session.Request.Body))))
		c.Write(response)

		// Parser.Execute 应该是这样子更合理
		// for {
		// 	request, ok, err := session.Parser.Execute(session.Setting, data)
		// 	if err != nil {
		// 		log.Printf("parse failed: %v", err)
		// 		c.Close()
		// 		return
		// 	}
		// 	if ok {
		// 		c.SetWriteDeadline(time.Now().Add(time.Second * 3))
		// 		c.Write(responseData)
		// 	} else {
		// 		break
		// 	}
		// }
	})

	err = g.Start()
	if err != nil {
		log.Printf("nbio.Start failed: %v\n", err)
		return
	}

	// go func() {
	// 	for {
	// 		time.Sleep(time.Second * 5)
	// 		log.Println(g.State().String())
	// 	}
	// }()

	g.Wait()
}

client.go

package main

import (
	"fmt"
	"io/ioutil"
	"net/http"
	"strings"
	"time"
)

func main() {
	url := "http://localhost:8080/echo"
	method := "POST"
	client := &http.Client{}

	for i := 0; i < 10; i++ {
		payload := strings.NewReader("hello")
		req, err := http.NewRequest(method, url, payload)
		if err != nil {
			fmt.Println(111, err)
		}
		res, err := client.Do(req)
		if err != nil {
			fmt.Println(222, err)
			return
		}
		defer res.Body.Close()
		body, err := ioutil.ReadAll(res.Body)

		fmt.Println("body:", string(body))

		time.Sleep(time.Second / 10)
	}
}

一次解析1.5包的问题

看注释部分:
注释1: // 两个POST
注释2: // 一个POST 518字节,一共两个POST,第一次解析600字节,第二次解析剩余的

setting有在MessageBegin、MessageComplete加日志,详见代码

只输出了一组:

---- begin
---- complete
package main

import (
	"fmt"
	"github.com/antlabs/httparser"
	"time"
)

// 两个POST
var data = []byte(
	"POST /joyent/http-parser HTTP/1.1\r\n" +
		"Host: github.com\r\n" +
		"DNT: 1\r\n" +
		"Accept-Encoding: gzip, deflate, sdch\r\n" +
		"Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4\r\n" +
		"User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) " +
		"AppleWebKit/537.36 (KHTML, like Gecko) " +
		"Chrome/39.0.2171.65 Safari/537.36\r\n" +
		"Accept: text/html,application/xhtml+xml,application/xml;q=0.9," +
		"image/webp,*/*;q=0.8\r\n" +
		"Referer: https://github.com/joyent/http-parser\r\n" +
		"Connection: keep-alive\r\n" +
		"Transfer-Encoding: chunked\r\n" +
		"Cache-Control: max-age=0\r\n\r\nb\r\nhello world\r\n0\r\n" +

		"POST /joyent/http-parser HTTP/1.1\r\n" +
		"Host: github.com\r\n" +
		"DNT: 1\r\n" +
		"Accept-Encoding: gzip, deflate, sdch\r\n" +
		"Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4\r\n" +
		"User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) " +
		"AppleWebKit/537.36 (KHTML, like Gecko) " +
		"Chrome/39.0.2171.65 Safari/537.36\r\n" +
		"Accept: text/html,application/xhtml+xml,application/xml;q=0.9," +
		"image/webp,*/*;q=0.8\r\n" +
		"Referer: https://github.com/joyent/http-parser\r\n" +
		"Connection: keep-alive\r\n" +
		"Transfer-Encoding: chunked\r\n" +
		"Cache-Control: max-age=0\r\n\r\nb\r\nhello world\r\n0\r\n")

var kBytes = int64(8) << 30

var setting = httparser.Setting{
	MessageBegin: func() {
		fmt.Println("---- begin")
	},
	URL: func(buf []byte) {
	},
	Status: func([]byte) {
		// 响应包才需要用到
	},
	HeaderField: func(buf []byte) {
	},
	HeaderValue: func(buf []byte) {
	},
	HeadersComplete: func() {

	},
	Body: func(buf []byte) {
	},
	MessageComplete: func() {
		fmt.Println("---- complete")
	},
	// MessageEnd: func() {
	// },
}

func bench(iterCount int64, silent bool) {
	var start time.Time
	if !silent {
		start = time.Now()
	}

	p := httparser.New(httparser.REQUEST)
	fmt.Printf("req_len=%d\n", len(data)/2)
	// 一个POST 518,一共两个POST,第一次解析600字节,第二次解析剩余的
	data1, data2 := data[:600], data[600:]
	sucess, err := p.Execute(&setting, data1)
	if err != nil {
		panic(err.Error())
	}
	if sucess < len(data1) {
		data2 = append(data1[sucess:], data2...)
	}

	sucess, err = p.Execute(&setting, data2)
	if err != nil {
		panic(err.Error())
	}
	if sucess != len(data2) {
		panic(fmt.Sprintf("sucess 222 length size:%d", sucess))
	}

	p.Reset()

	if !silent {
		end := time.Now()

		fmt.Printf("Benchmark result:\n")

		elapsed := end.Sub(start) / time.Second

		total := iterCount * int64(len(data))
		bw := float64(total) / float64(elapsed)

		fmt.Printf("%.2f mb | %.2f mb/s | %.2f req/sec | %.2f s\n",
			float64(total)/(1024*1024),
			bw/(1024*1024),
			float64(iterCount)/float64(elapsed),
			float64(elapsed))

	}
}

func main() {
	// iterations := kBytes / int64(len(data))
	// bench(iterations, false)
	bench(1, false)
}

粘包的处理好像还是有问题

试试一个字节一个字节、完整请求的数据随机拆分成不同长度的很多段丢给parser处理:

package test

import (
	"fmt"
	"math/rand"
	"testing"
	"time"

	"github.com/antlabs/httparser"
)

func TestServerParserContentLength(t *testing.T) {
	data := []byte("POST /echo HTTP/1.1\r\nHost: localhost:8080\r\nConnection: close \r\nAccept-Encoding : gzip \r\n\r\n")
	testParser(t, data)

	data = []byte("POST /echo HTTP/1.1\r\nHost: localhost:8080\r\nConnection: close \r\nContent-Length :  0\r\nAccept-Encoding : gzip \r\n\r\n")
	testParser(t, data)

	data = []byte("POST /echo HTTP/1.1\r\nHost: localhost:8080\r\nConnection: close \r\nContent-Length :  5\r\nAccept-Encoding : gzip \r\n\r\nhello")
	testParser(t, data)
}

func TestServerParserChunks(t *testing.T) {
	data := []byte("POST / HTTP/1.1\r\nHost: localhost:1235\r\nUser-Agent: Go-http-client/1.1\r\nTransfer-Encoding: chunked\r\nAccept-Encoding: gzip\r\n\r\n4\r\nbody\r\n0\r\n\r\n")
	testParser(t, data)
}

func TestServerParserTrailer(t *testing.T) {
	data := []byte("POST / HTTP/1.1\r\nHost: localhost:1235\r\nUser-Agent: Go-http-client/1.1\r\nTransfer-Encoding: chunked\r\nTrailer: Md5,Size\r\nAccept-Encoding: gzip\r\n\r\n4\r\nbody\r\n0\r\nMd5: 841a2d689ad86bd1611447453c22c6fc\r\nSize: 4\r\n\r\n")
	testParser(t, data)
}

func testParser(t *testing.T, data []byte) error {
	setting := httparser.Setting{
		MessageBegin:    func(*httparser.Parser) {},
		URL:             func(*httparser.Parser, []byte) {},
		Status:          func(*httparser.Parser, []byte) {},
		HeaderField:     func(*httparser.Parser, []byte) {},
		HeaderValue:     func(*httparser.Parser, []byte) {},
		HeadersComplete: func(*httparser.Parser) {},
		Body:            func(*httparser.Parser, []byte) {},
		MessageComplete: func(*httparser.Parser) {},
	}
	p := httparser.New(httparser.REQUEST)

	var remain []byte
	for i := 0; i < len(data); i++ {
		b := append(remain, data[i:i+1]...)
		n, err := p.Execute(&setting, b)
		if err != nil {
			t.Fatal(fmt.Errorf("%v success, %v", n, err))
		}
		if n < len(b) {
			remain = append([]byte{}, b[n:]...)
		}
	}

	nRequest := 0
	data = append(data, data...)
	setting = httparser.Setting{
		MessageBegin:    func(*httparser.Parser) {},
		URL:             func(*httparser.Parser, []byte) {},
		Status:          func(*httparser.Parser, []byte) {},
		HeaderField:     func(*httparser.Parser, []byte) {},
		HeaderValue:     func(*httparser.Parser, []byte) {},
		HeadersComplete: func(*httparser.Parser) {},
		Body:            func(*httparser.Parser, []byte) {},
		MessageComplete: func(*httparser.Parser) {
			nRequest++
		},
	}

	tBegin := time.Now()
	loop := 100000
	for i := 0; i < loop; i++ {
		tmp := data
		var remain []byte
		for len(tmp) > 0 {
			nRead := int(rand.Intn(len(tmp)) + 1)
			readBuf := append(remain, tmp[:nRead]...)
			tmp = tmp[nRead:]
			n, err := p.Execute(&setting, readBuf)
			if err != nil {
				t.Fatal(fmt.Errorf("%v success, %v", n, err))
			}
			if n < len(readBuf) {
				remain = append([]byte{}, readBuf[n:]...)
			}

		}
		if nRequest != (i+1)*2 {
			return fmt.Errorf("nRequest: %v, %v", i, nRequest)
		}
	}
	tUsed := time.Since(tBegin)
	fmt.Printf("%v loops, %v s used, %v ns/op, %v req/s\n", loop, tUsed.Seconds(), tUsed.Nanoseconds()/int64(loop), float64(loop)/tUsed.Seconds())

	return nil
}

建议自带默认的字段解析器

MessageComplete时作为参数传给业务层
MessageComplete(request)

直接用http.Request方便,或者自家实现个并且支持lazy parse性能更好些

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.