Code Monkey home page Code Monkey logo

fluent-plugin-azuresearch's Introduction

Azure Search output plugin for Fluentd

fluent-plugin-azuresearch is a fluent plugin to output to Azure Search

Requirements

fluent-plugin-azuresearch fluentd ruby
>= 0.2.0 >= v0.14.15 >= 2.1
< 0.2.0 >= v0.12.0 >= 1.9

Installation

$ gem install fluent-plugin-azuresearch

Configuration

Azure Search

To use Microsoft Azure Search, you must create an Azure Search service in the Azure Portal. Also you must have an index, persisted storage of documents to which fluent-plugin-azuresearch writes event stream out. Here are instructions:

Fluentd - fluent.conf

<match azuresearch.*>
    @type azuresearch
    @log_level info
    endpoint   https://AZURE_SEARCH_ACCOUNT.search.windows.net
    api_key    AZURE_SEARCH_API_KEY
    search_index  messages
    column_names id,user_name,message,tag,created_at
    key_names postid,user,content,tag,posttime
</match>
  • endpoint (required) - Azure Search service endpoint URI
  • api_key (required) - Azure Search API key
  • search_index (required) - Azure Search Index name to insert records
  • column_names (required) - Column names in a target Azure search index. Each column needs to be separated by a comma.
  • key_names (optional) - Default:nil. Key names in incomming record to insert. Each key needs to be separated by a comma. ${time} is placeholder for Time.at(time).strftime("%Y-%m-%dT%H:%M:%SZ"), and ${tag} is placeholder for tag. By default, key_names is as same as column_names

[note] @log_level is a fluentd built-in parameter (optional) that controls verbosity of logging: fatal|error|warn|info|debug|trace (See also Logging of Fluentd)

Sample Configurations

Case1 - column_names is as same as key_names

Suppose you have the following fluent.conf and azure search index schema:

fluent.conf

<match azuresearch.*>
    @type azuresearch
    endpoint   https://yoichidemo.search.windows.net
    api_key    2XX3D2456052A9AD21E54CB03C3ABF6A(dummy)
    search_index  messages
    column_names id,user_name,message,created_at
</match>

Azure Search Schema: messages

{
    "name": "messages",
    "fields": [
        { "name":"id", "type":"Edm.String", "key": true, "searchable": false },
        { "name":"user_name", "type":"Edm.String" },
        { "name":"message", "type":"Edm.String", "filterable":false, "sortable":false, "facetable":false, "analyzer":"en.lucene" },
        { "name":"created_at", "type":"Edm.DateTimeOffset", "facetable":false}
    ]
}

The plugin will write event stream out to Azure Ssearch like this:

Input event stream

{ "id": "1", "user_name": "taylorswift13", "message":"post by taylorswift13", "created_at":"2016-01-29T00:00:00Z" },
{ "id": "2", "user_name": "katyperry", "message":"post by katyperry", "created_at":"2016-01-30T00:00:00Z" },
{ "id": "3", "user_name": "ladygaga", "message":"post by ladygaga", "created_at":"2016-01-31T00:00:00Z" }

Search results

"value": [
    { "@search.score": 1, "id": "1", "user_name": "taylorswift13", "message": "post by taylorswift13", "created_at": "2016-01-29T00:00:00Z" },
    { "@search.score": 1, "id": "2", "user_name": "katyperry", "message": "post by katyperry", "created_at": "2016-01-30T00:00:00Z" },
    { "@search.score": 1, "id": "3", "user_name": "ladygaga", "message": "post by ladygaga", "created_at": "2016-01-31T00:00:00Z" }
]

Case2 - column_names is NOT as same as key_names

Suppose you have the following fluent.conf and azure search index schema:

fluent.conf

<match azuresearch.*>
    @type azuresearch
    endpoint   https://yoichidemo.search.windows.net
    api_key    2XX3D2456052A9AD21E54CB03C3ABF6A(dummy)
    search_index  messages
    column_names id,user_name,message,created_at
    key_names postid,user,content,posttime
</match>

Azure Search Schema: messages

{
    "name": "messages",
    "fields": [
        { "name":"id", "type":"Edm.String", "key": true, "searchable": false },
        { "name":"user_name", "type":"Edm.String" },
        { "name":"message", "type":"Edm.String", "filterable":false, "sortable":false, "facetable":false, "analyzer":"en.lucene" },
        { "name":"created_at", "type":"Edm.DateTimeOffset", "facetable":false}
    ]
}

The plugin will write event stream out to Azure Ssearch like this:

Input event stream

{ "postid": "1", "user": "taylorswift13", "content":"post by taylorswift13", "posttime":"2016-01-29T00:00:00Z" },
{ "postid": "2", "user": "katyperry", "content":"post by katyperry", "posttime":"2016-01-30T00:00:00Z" },
{ "postid": "3", "user": "ladygaga", "content":"post by ladygaga", "posttime":"2016-01-31T00:00:00Z" }

Search results

"value": [
    { "@search.score": 1, "id": "1", "user_name": "taylorswift13", "message": "post by taylorswift13", "created_at": "2016-01-29T00:00:00Z" },
    { "@search.score": 1, "id": "2", "user_name": "katyperry", "message": "post by katyperry", "created_at": "2016-01-30T00:00:00Z" },
    { "@search.score": 1, "id": "3", "user_name": "ladygaga", "message": "post by ladygaga", "created_at": "2016-01-31T00:00:00Z" }
]

Case3 - column_names is NOT as same as key_names, Plus, key_names includes ${time} and ${tag}

fluent.conf

<match azuresearch.*>
    @type azuresearch
    endpoint   https://yoichidemo.search.windows.net
    api_key    2XX3D2456052A9AD21E54CB03C3ABF6A(dummy)
    search_index  messages
    column_names id,user_name,message,tag,created_at
    key_names postid,user,content,${tag},${time}
</match>

Azure Search Schema: messages

{
    "name": "messages",
    "fields": [
        { "name":"id", "type":"Edm.String", "key": true, "searchable": false },
        { "name":"user_name", "type":"Edm.String" },
        { "name":"message", "type":"Edm.String", "filterable":false, "sortable":false, "facetable":false, "analyzer":"en.lucene" },
        { "name":"created_at", "type":"Edm.DateTimeOffset", "facetable":false}
    ]
}

The plugin will write event stream out to Azure Ssearch like this:

Input event stream

{ "id": "1", "user_name": "taylorswift13", "message":"post by taylorswift13" },
{ "id": "2", "user_name": "katyperry", "message":"post by katyperry" },
{ "id": "3", "user_name": "ladygaga", "message":"post by ladygaga" }

Search results

"value": [
    { "@search.score": 1, "id": "1", "user_name": "taylorswift13", "message": "post by taylorswift13", "tag": "azuresearch.msg", "created_at": "2016-01-31T21:03:41Z" },
    { "@search.score": 1, "id": "2", "user_name": "katyperry", "message": "post by katyperry", "tag": "azuresearch.msg", "created_at": "2016-01-31T21:03:41Z" },
    { "@search.score": 1, "id": "3", "user_name": "ladygaga", "message": "post by ladygaga", "tag": "azuresearch.msg", "created_at": "2016-01-31T21:03:41Z" }
]

[note] the value of created_at above is the time when fluentd actually recieves its corresponding input event.

Tests

Running test code

$ git clone https://github.com/yokawasa/fluent-plugin-azuresearch.git
$ cd fluent-plugin-azuresearch

# edit CONFIG params of test/plugin/test_azuresearch.rb 
$ vi test/plugin/test_azuresearch.rb

# run test 
$ rake test

Creating package, running and testing locally

$ rake build
$ rake install:local
 
# running fluentd with your fluent.conf
$ fluentd -c fluent.conf -vv &
 
# send test input event to test plugin using fluent-cat
$ echo ' { "postid": "100", "user": "ladygaga", "content":"post by ladygaga"}' | fluent-cat azuresearch.msg

Please don't forget that you need forward input configuration to receive the message from fluent-cat

<source>
    @type forward
</source>

TODOs

  • Input validation for Azure Search - check total size of columns to add

Change log

Links

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/yokawasa/fluent-plugin-azuresearch.

Copyright

CopyrightCopyright (c) 2016- Yoichi Kawasaki
LicenseApache License, Version 2.0

fluent-plugin-azuresearch's People

Contributors

cosmo0920 avatar okkez avatar yokawasa avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

fluent-plugin-azuresearch's Issues

Fix: CVE-2020-8130 Moderate severity

moderate severity
Vulnerable versions: <= 12.3.2
Patched version: 12.3.3
here is an OS command injection vulnerability in Ruby Rake before 12.3.3 in Rake::FileList when supplying a filename that begins with the pipe character |.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.