Code Monkey home page Code Monkey logo

kafka-connect-github-source's Introduction

Learning

This project is a companion repository to the Apache Kafka Connect course on Udemy.

https://links.datacumulus.com/kafka-connect-coupon

Kafka Connect Source GitHub

This connector allows you to get a stream of issues and pull requests from your GitHub repository, using the GitHub Api: https://developer.github.com/v3/issues/#list-issues-for-a-repository

Issues are pulled based on updated_at field, meaning any update to an issue or pull request will appear in the stream.

The connector writes to topic that is great candidate to demonstrate log compaction. It's also a fun way to automate your GitHub workflow.

It's finally aimed to be an educative example to demonstrate how to write a Source Connector a little less trivial than the FileStreamSourceConnector provided in Kafka.

Contributing

This connector is not perfect and can be improved, please feel free to submit any PR you deem useful.

Configuration

name=GitHubSourceConnectorDemo
tasks.max=1
connector.class=com.simplesteph.kafka.GitHubSourceConnector
topic=github-issues
github.owner=kubernetes
github.repo=kubernetes
since.timestamp=2017-01-01T00:00:00Z
# I heavily recommend you set those two fields:
auth.username=your_username
auth.password=your_password

Running in development

Note: Java 8 is required for this connector. Make sure config/worker.properties is configured to wherever your kafka cluster is

./build.sh
./run.sh 

The simplest way to run run.sh is to have docker installed. It will pull a Dockerfile and run the connector in standalone mode above it.

Deploying

Note: Java 8 is required for this connector.

TODO

kafka-connect-github-source's People

Contributors

dvoils avatar iterrator avatar kieronedwards avatar rmpt avatar simplesteph avatar tote-cca-build-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

kafka-connect-github-source's Issues

GitHubAPIHttpClient http response headers have wrong case

Problem

Currently GitHubSourceTaskTest fails when attempting to assert headers in the response from GitHub. This prevents the connector from being built. This seems due to change of the github api from v3 to v4 which now uses graphql and has a different strategy for rate limiting

Solution

Need to change "RateLimit" to "Ratelimit" in the headers. The test needs to be updated to take account of these ideally using constants. Further more the "Etag" property in the test is also now lower case and perhaps can be removed as is not used in the code (?)

Longer Term Solution

Upgrade to use the graphql version of the github api (ratelimits are specified in an object).

Get rid of nasty error on first startup when in standalone mode

Error below:

[2017-04-27 22:55:18,729] ERROR CRITICAL: Failed to deserialize offset data when getting offsets for task with namespace GitHubSourceConnectorDemo-2. No value for this data will be returned, which may break the task or cause it to skip some data. This could either be due to an error in the connector implementation or incompatible schema. (org.apache.kafka.connect.storage.OffsetStorageReaderImpl:102)
org.apache.kafka.connect.errors.DataException: JsonConverter with schemas.enable requires "schema" and "payload" fields and may not contain additional fields. If you are trying to deserialize plain JSON data, set schemas.enable=false in your converter configuration.
        at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:309)
        at org.apache.kafka.connect.storage.OffsetStorageReaderImpl.offsets(OffsetStorageReaderImpl.java:96)
        at org.apache.kafka.connect.storage.OffsetStorageReaderImpl.offset(OffsetStorageReaderImpl.java:54)
        at com.simplesteph.kafka.GitHubSourceTask.initializeLastVariables(GitHubSourceTask.java:57)
        at com.simplesteph.kafka.GitHubSourceTask.start(GitHubSourceTask.java:52)
        at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:141)
        at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:139)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:182)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Connector still works, and the error disappears after some offsets have been first committed.
PR welcome

how i create a custom group.id for connector workers

hi,
i am running kafka connector and tried to sink topic data to file, as my topic is correct but kafka connect is prefixing connect- to name of group.id .

eg name: simplesteph
in consume group.id = connect-simplesteph.

can i change the prefix.

thank you.

Dependency org.apache.httpcomponents:httpclient, leading to CVE problem

Hi, In kafka-connect-github-source,there is a dependency org.apache.httpcomponents:httpclient:4.5.2 that calls the risk method.

CVE-2020-13956

The scope of this CVE affected version is [,4.5.13)

After further analysis, in this project, the main Api called is <org.apache.http.client.utils.URIUtils: org.apache.http.HttpHost extractHost(java.net.URI)>

Risk method repair link : GitHub

CVE Bug Invocation Path--

Path Length : 6

<org.apache.http.client.utils.URIUtils: org.apache.http.HttpHost extractHost(java.net.URI)>
at <org.apache.http.impl.client.DecompressingHttpClient: org.apache.http.HttpHost getHttpHost(org.apache.http.client.methods.HttpUriRequest)> (org.apache.http.impl.client.DecompressingHttpClient.java:[137]) in /.m2/repository/org/apache/httpcomponents/httpclient/4.5.2/httpclient-4.5.2.jar
at <org.apache.http.impl.client.DecompressingHttpClient: org.apache.http.HttpResponse execute(org.apache.http.client.methods.HttpUriRequest)> (org.apache.http.impl.client.DecompressingHttpClient.java:[123]) in /.m2/repository/org/apache/httpcomponents/httpclient/4.5.2/httpclient-4.5.2.jar
at <com.mashape.unirest.http.HttpClientHelper: com.mashape.unirest.http.HttpResponse request(com.mashape.unirest.request.HttpRequest,java.lang.Class)> (com.mashape.unirest.http.HttpClientHelper.java:[138]) in /.m2/repository/com/mashape/unirest/unirest-java/1.4.9/unirest-java-1.4.9.jar
at <com.mashape.unirest.request.BaseRequest: com.mashape.unirest.http.HttpResponse asJson()> (com.mashape.unirest.request.BaseRequest.java:[68]) in /.m2/repository/com/mashape/unirest/unirest-java/1.4.9/unirest-java-1.4.9.jar
at <com.simplesteph.kafka.GitHubAPIHttpClient: com.mashape.unirest.http.HttpResponse getNextIssuesAPI(java.lang.Integer,java.time.Instant)> (com.simplesteph.kafka.GitHubAPIHttpClient.java:[84]) in /detect/unzip/kafka-connect-github-source-1.1/target/classes

Dependency tree--

[INFO] com.simplesteph.kafka:kafka-connect-github-source:jar:1.1
[INFO] +- org.apache.kafka:connect-api:jar:0.10.2.0-cp1:provided
[INFO] |  +- org.apache.kafka:kafka-clients:jar:0.10.2.0-cp1:provided
[INFO] |  |  +- net.jpountz.lz4:lz4:jar:1.3.0:provided
[INFO] |  |  \- org.xerial.snappy:snappy-java:jar:1.1.2.6:provided
[INFO] |  \- org.slf4j:slf4j-api:jar:1.7.21:compile
[INFO] +- org.slf4j:slf4j-log4j12:jar:1.7.25:compile
[INFO] |  \- log4j:log4j:jar:1.2.17:compile
[INFO] \- com.mashape.unirest:unirest-java:jar:1.4.9:compile
[INFO]    +- org.apache.httpcomponents:httpclient:jar:4.5.2:compile
[INFO]    |  +- org.apache.httpcomponents:httpcore:jar:4.4.4:compile
[INFO]    |  +- commons-logging:commons-logging:jar:1.2:compile
[INFO]    |  \- commons-codec:commons-codec:jar:1.9:compile
[INFO]    +- org.apache.httpcomponents:httpasyncclient:jar:4.1.1:compile
[INFO]    |  \- org.apache.httpcomponents:httpcore-nio:jar:4.4.4:compile
[INFO]    +- org.apache.httpcomponents:httpmime:jar:4.5.2:compile
[INFO]    \- org.json:json:jar:20160212:compile

Suggested solutions:

Update dependency version to 4.5.13 or higher

Thank you very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.