Code Monkey home page Code Monkey logo

moembed's Introduction

Test

MoEmbed

MoEmbed is a embed data provider which supports any websites.

Features and concepts

  • Proxying ... Proxy to known websites' oEmbed endpoint. (like Twitter, YouTube, etc...)
  • Converting ... Convert proxied oEmbed response to more slim ones. (Replace iframe and JavaScript to simple and lightweight HTML)
  • Every URL ... Support any website even if they don't provide oEmbed endpoint. It parses og:title, og:image, <title> tag, and other related elements.
  • Caching ... It caches responses for performance.
  • Easy use ... Just pass an URL encoded url as a url query string, like http://localhost:5000/?url=https%3A%2F%2Fexample.com%2F.

EmbedData data structure

(TBD)

How to use

Installation

Docker

docker build . -f MoEmbed.App/Dockerfile -t moembed:latest
docker run --rm -it -p 5000:5000 moembed:latest

Requirements

  • Redis
  • dotnet tool install --global dotnet-script --version 1.4.0

Setup

(TBD)

Update OEmbed metadata providers

cd MoEmbed.CodeGeneration
dotnet script OEmbedProxyMetadataProviders.csx

License

MIT License

moembed's People

Contributors

0xbaddcafe avatar mohemohe avatar pgrho avatar supermomonga avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

moembed's Issues

Twitter が展開されない時がある

Twitter が展開されない時がある。
細かい条件は不明ですが、特定ツイートの問題ではなさそう。

  1. https://twitter.com/kayyyma/status/915935073165119490

少し前に貼った時に展開されなかったツイート。
先程貼り直したところ展開された。

  1. https://twitter.com/Any512/status/917329362352988160

先程貼った時に展開されなかったツイート。
そのすぐ後に貼り直したが展開されなかった。

Error at steam url

e.g.) http://store.steampowered.com/app/648100/

web_1  | info: Microsoft.AspNetCore.Hosting.Internal.WebHost[1]
web_1  |       Request starting HTTP/1.1 GET http://embed.kokoro.io/api?url=http%3A%2F%2Fstore.steampowered.com%2Fapp%2F648100%2F&format=xml  
web_1  | info: MoEmbed.HttpMetadataHandler[0]
web_1  |       Handling URL: http://store.steampowered.com/app/648100/
web_1  | info: MoEmbed.HttpMetadataHandler[0]
web_1  |       An exception thrown: System.IndexOutOfRangeException: Index was outside the bounds of the array.
web_1  |          at Shipwreck.OpenGraph.PropertyPath.StartsWith(PropertyPath property, String path, Boolean& matched, Boolean skipCompareProperty)
web_1  |          at Shipwreck.OpenGraph.GraphObject.<>c__DisplayClass105_0.<GetLocalProperty>b__0(PropertyEntry kv)
web_1  |          at System.Linq.Enumerable.FirstOrDefault[TSource](IEnumerable`1 source, Func`2 predicate)
web_1  |          at Shipwreck.OpenGraph.GraphObject.GetLocalProperty(String property)
web_1  |          at MoEmbed.Models.Metadata.UnknownMetadata.LoadHtml(HtmlDocument htmlDocument) in /usr/src/MoEmbed.Core/Models/Metadata/UnknownMetadata.cs:line 236
web_1  |          at MoEmbed.Models.Metadata.UnknownMetadata.<FetchAsyncCore>d__39.MoveNext() in /usr/src/MoEmbed.Core/Models/Metadata/UnknownMetadata.cs:line 158
web_1  |       --- End of stack trace from previous location where exception was thrown ---
web_1  |          at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
web_1  |          at MoEmbed.Models.Metadata.UnknownMetadata.<FetchAsyncCore>d__39.MoveNext() in /usr/src/MoEmbed.Core/Models/Metadata/UnknownMetadata.cs:line 203
web_1  |       --- End of stack trace from previous location where exception was thrown ---
web_1  |          at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
web_1  |          at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
web_1  |          at MoEmbed.MetadataService.<GetDataAsync>d__6.MoveNext() in /usr/src/MoEmbed.Core/MetadataService.cs:line 62
web_1  |       --- End of stack trace from previous location where exception was thrown ---
web_1  |          at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
web_1  |          at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
web_1  |          at MoEmbed.HttpMetadataHandler.<HandleAsync>d__4.MoveNext() in /usr/src/MoEmbed.Core/HttpMetadataHandler.cs:line 69
web_1  | info: Microsoft.AspNetCore.Hosting.Internal.WebHost[2]
web_1  |       Request finished in 4.7732ms 404 text/xml

長過ぎる meta name="description" の content を丸めたい

ttp://www.server-memo.net/centos-settings/firewalld/firewalld.html のようなサイトで、 descriptioncontent に本文が全て入っているようなサイトがあります。

流石に全部展開されるとうざいので、どうにかしたい気がします。

raw resource へのアクセス抑制

特に動画ファイルは読み込みたくない

まずHEADリクエストだけ送って、レスポンスヘッダのContent-Typeによっては再度GETリクエストを送る?

YouTube URL での失敗

2020-06-14T10:35:49.916277+00:00 heroku[router]: at=info method=GET path="/api?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DFwBMBsRJoPA&format=json" host=moembed.herokuapp.com request_id=4ebd31f4-9c49-48b9-b0ff-743761c2198f fwd="217.178.17.77" dyno=web.1 connect=0ms service=7ms status=404 bytes=204 protocol=http
2020-06-14T10:35:49.916447+00:00 app[web.1]: info: Microsoft.AspNetCore.Hosting.Diagnostics[1]
2020-06-14T10:35:49.916456+00:00 app[web.1]: Request starting HTTP/1.1 GET http://moembed.herokuapp.com/api?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DFwBMBsRJoPA&format=json
2020-06-14T10:35:49.916457+00:00 app[web.1]: info: MoEmbed.HttpMetadataHandler[0]
2020-06-14T10:35:49.916457+00:00 app[web.1]: Handling URL: https://www.youtube.com/watch?v=FwBMBsRJoPA(null)
2020-06-14T10:35:49.919807+00:00 app[web.1]: info: MoEmbed.HttpMetadataHandler[0]
2020-06-14T10:35:49.919837+00:00 app[web.1]: An exception thrown: System.AggregateException: One or more errors occurred. (Response status code does not indicate success: 429 (Too Many Requests).)
2020-06-14T10:35:49.919838+00:00 app[web.1]: ---> System.Net.Http.HttpRequestException: Response status code does not indicate success: 429 (Too Many Requests).
2020-06-14T10:35:49.919839+00:00 app[web.1]: at System.Net.Http.HttpResponseMessage.EnsureSuccessStatusCode()
2020-06-14T10:35:49.919846+00:00 app[web.1]: at MoEmbed.Models.Metadata.UnknownMetadata.FetchOnceAsync(RequestContext context) in /tmp/build_d1fbc0c8cc8ce3bab1dc5d37224111ea/MoEmbed.Core/Models/Metadata/UnknownMetadata.cs:line 105
2020-06-14T10:35:49.919847+00:00 app[web.1]: at MoEmbed.Models.RequestContextExtensions.ExecuteAsync(RequestContext context, Func`2 func) in /tmp/build_d1fbc0c8cc8ce3bab1dc5d37224111ea/MoEmbed.Core/Models/RequestContextExtensions.cs:line 24
2020-06-14T10:35:49.919848+00:00 app[web.1]: at MoEmbed.Models.RequestContextExtensions.ExecuteAsync(RequestContext context, Func`2 func) in /tmp/build_d1fbc0c8cc8ce3bab1dc5d37224111ea/MoEmbed.Core/Models/RequestContextExtensions.cs:line 40
2020-06-14T10:35:49.919848+00:00 app[web.1]: --- End of inner exception stack trace ---
2020-06-14T10:35:49.919849+00:00 app[web.1]: at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
2020-06-14T10:35:49.919849+00:00 app[web.1]: at MoEmbed.Models.Metadata.UnknownMetadata.<FetchAsyncCore>b__15_0(Task`1 t) in /tmp/build_d1fbc0c8cc8ce3bab1dc5d37224111ea/MoEmbed.Core/Models/Metadata/UnknownMetadata.cs:line 90
2020-06-14T10:35:49.919850+00:00 app[web.1]: at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
2020-06-14T10:35:49.919850+00:00 app[web.1]: at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
2020-06-14T10:35:49.919851+00:00 app[web.1]: --- End of stack trace from previous location where exception was thrown ---
2020-06-14T10:35:49.919851+00:00 app[web.1]: at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
2020-06-14T10:35:49.919851+00:00 app[web.1]: --- End of stack trace from previous location where exception was thrown ---
2020-06-14T10:35:49.919852+00:00 app[web.1]: at MoEmbed.MetadataService.GetDataAsync(ConsumerRequest request) in /tmp/build_d1fbc0c8cc8ce3bab1dc5d37224111ea/MoEmbed.Core/MetadataService.cs:line 135
2020-06-14T10:35:49.919852+00:00 app[web.1]: at MoEmbed.HttpMetadataHandler.HandleAsync(HttpContext context) in /tmp/build_d1fbc0c8cc8ce3bab1dc5d37224111ea/MoEmbed.Core/HttpMetadataHandler.cs:line 122
2020-06-14T10:35:49.919853+00:00 app[web.1]: info: Microsoft.AspNetCore.Hosting.Diagnostics[2]
2020-06-14T10:35:49.919853+00:00 app[web.1]: Request finished in 2.5369ms 404 application/json

XMLNS

<EmbedData xmlns="clr-namespace:MoEmbed.Models;assembly=MoEmbed.Core" />

悲しい

<EmbedData xmlns="http://sugoi/url" />

かっこいい

Code generation fails on heroku env

-----> ASP.NET Core app detected
Installing dotnet
publish /tmp/build_a47cf451cea5e8578982a2211441cec6/MoEmbed.App/MoEmbed.App.csproj for Release
Welcome to .NET Core 3.0!
---------------------
SDK Version: 3.0.100
----------------
Explore documentation: https://aka.ms/dotnet-docs
Report issues and find source on GitHub: https://github.com/dotnet/core
Find out what's new: https://aka.ms/dotnet-whats-new
Learn about the installed HTTPS developer cert: https://aka.ms/aspnet-core-https
Use 'dotnet --help' to see available commands or visit: https://aka.ms/dotnet-cli-docs
Write your first app: https://aka.ms/first-net-core-app
--------------------------------------------------------------------------------------
Microsoft (R) Build Engine version 16.3.0+0f4c62fea for .NET Core
Copyright (C) Microsoft Corporation. All rights reserved.
  Restore completed in 139.35 ms for /tmp/build_a47cf451cea5e8578982a2211441cec6/MoEmbed.CodeGeneration/MoEmbed.CodeGeneration.csproj.
  Restore completed in 600.91 ms for /tmp/build_a47cf451cea5e8578982a2211441cec6/MoEmbed.Core/MoEmbed.Core.csproj.
  Restore completed in 13.7 ms for /tmp/build_a47cf451cea5e8578982a2211441cec6/MoEmbed.Models/MoEmbed.Models.csproj.
  Restore completed in 437.14 ms for /tmp/build_a47cf451cea5e8578982a2211441cec6/MoEmbed.Twitter/MoEmbed.Twitter.csproj.
  Restore completed in 1.43 sec for /tmp/build_a47cf451cea5e8578982a2211441cec6/MoEmbed.App/MoEmbed.App.csproj.
  Restore completed in 2.05 sec for /tmp/build_a47cf451cea5e8578982a2211441cec6/MoEmbed.CodeGeneration/MoEmbed.CodeGeneration.csproj.
  Generate OEmbed proxy metadata providers.
  It was not possible to find any compatible framework version
  The specified framework 'Microsoft.NETCore.App', version '2.1.0' was not found.
    - The following frameworks were found:
        3.0.0 at [/app/tmp/cache/dotnet/3.0.100/sdk/shared/Microsoft.NETCore.App]
  
  You can resolve the problem by installing the specified framework and/or SDK.
  
  The .NET Core frameworks can be found at:
    - https://aka.ms/dotnet-download
/tmp/build_a47cf451cea5e8578982a2211441cec6/MoEmbed.CodeGeneration/MoEmbed.CodeGeneration.csproj(23,5): error MSB3073: The command "dotnet script OEmbedProxyMetadataProviders.csx" exited with code 150.
  MoEmbed.Models -> /tmp/build_a47cf451cea5e8578982a2211441cec6/MoEmbed.Models/bin/Release/netcoreapp3.0/MoEmbed.Models.dll
  Successfully created package '/tmp/build_a47cf451cea5e8578982a2211441cec6/MoEmbed.Models/bin/Release/MoEmbed.Models.1.0.0.nupkg'.
 !     Push rejected, failed to compile ASP.NET Core app.
 !     Push failed

The specified framework 'Microsoft.NETCore.App', version '2.1.0' was not found.

なぜ 2.1.0 が

[Twitter] Quoted Tweet の対応

https://cdn.syndication.twimg.com/tweet-result?id=1680374826375184386 には quoted_tweet プロパティがあるので取得は可能。

  • MoEmbed のレスポンスに quoted_tweet のようなプロパティを追加してクライアントに利用方法を任せる
  • description プロパティの中に特定の書式で追加する

のいずれの方針にするか検討。

HTTP通信時に例外が発生した場合の対応

  1. MetadataFetchAsyncCoreでHTTP通信に失敗した場合、正常な結果を返すまで(最大30s程度の間)再試行を行うかどうか
  2. 1.のリトライがすべて失敗して結局例外をスローした場合、次回のFetchAsyncではキャッシュされているTask.StatusFaultedになっているが、そのままキャッシュを返すかどうか。またFaultedなキャッシュに有効期間を設けるかどうか。

個人的提案としては

  1. 1s, 2s, 4s, 8sのウェイトを置いて最大4回リトライする。
  2. 例外発生から5分間有効とする

ぐらいかなー。数字に根拠はないけど。

ちなみにAmazonMetadataProviderは仕様上ガンガン例外が返ってくるので80ms~1280msでリトライしてます。

Build fails on heroku

-----> ASP.NET Core app detected
Installing dotnet
-----> Removing old cached .NET version
-----> Fetching .NET SDK
-----> Fetching .NET Runtime
publish /tmp/build_2d4ce4c9089f8cc9809985786f683b9a/MoEmbed.CodeGeneration/MoEmbed.CodeGeneration.csproj for Release
Microsoft (R) Build Engine version 16.0.450+ga8dc7f1d34 for .NET Core
Copyright (C) Microsoft Corporation. All rights reserved.
  Restore completed in 3.25 sec for /tmp/build_2d4ce4c9089f8cc9809985786f683b9a/MoEmbed.CodeGeneration/MoEmbed.CodeGeneration.csproj.
  Restore completed in 4.89 sec for /tmp/build_2d4ce4c9089f8cc9809985786f683b9a/MoEmbed.CodeGeneration/MoEmbed.CodeGeneration.csproj.
  System.IO.DirectoryNotFoundException: Could not find a part of the path '/tmp/build_2d4ce4c9089f8cc9809985786f683b9a/oembed/providers'.
     at System.IO.Enumeration.FileSystemEnumerator`1.CreateDirectoryHandle(String path, Boolean ignoreNotFound)
     at System.IO.Enumeration.FileSystemEnumerator`1..ctor(String directory, EnumerationOptions options)
     at System.IO.Enumeration.FileSystemEnumerable`1..ctor(String directory, FindTransform transform, EnumerationOptions options)
     at System.IO.Enumeration.FileSystemEnumerableFactory.UserFiles(String directory, String expression, EnumerationOptions options)
     at System.IO.Directory.InternalEnumeratePaths(String path, String searchPattern, SearchTarget searchTarget, EnumerationOptions options)
     at System.IO.Directory.GetFiles(String path, String searchPattern)
     at Submission#0.<<Initialize>>d__0.MoveNext() in /tmp/build_2d4ce4c9089f8cc9809985786f683b9a/MoEmbed.CodeGeneration/OEmbedProxyMetadataProviders.csx:line 27
  --- End of stack trace from previous location where exception was thrown ---
     at Dotnet.Script.Core.ScriptRunner.Execute[TReturn](String dllPath, IEnumerable`1 commandLineArgs)
/tmp/build_2d4ce4c9089f8cc9809985786f683b9a/MoEmbed.CodeGeneration/MoEmbed.CodeGeneration.csproj(22,5): error MSB3073: The command "dotnet script OEmbedProxyMetadataProviders.csx" exited with code 1.
 !     Push rejected, failed to compile ASP.NET Core app.
 !     Push failed

URLにUriとstringが混在しているのを整理する

URLとして解析する必要がある→Uri

  • ConsumerRequest

    • Uri Url
  • RequestContext

    • Uri Url

↑のAPIに依存している→Uri

  • UnknownMetadata
    • string Uri → Uri
    • string MovedTo → Uri

その他

  • EmbedData

    • Uri Url→ string
    • Uri AuthorUrl → string
    • Uri ProviderUrl→ string
  • ImageInfo

    • Uri Url→ string
  • Media

    • Uri RawUrl→ string
    • Uri Location→ string
  • TwitterMetadata

    • Uri Url → string Uri
  • MoEmbed.Models.OEmbed.* (いるの?)

    • Uri * → string *

プロジェクトの分割

コア、サービス別ハンドラー、Webアプリに分けるべきでは。

  • MoEmbed.Core (NuGetパッケージ化)

    • IHandler、EmbedObject、汎用ハンドラー
  • MoEmbed.Twitter (NuGetパッケージ化)

    • TwitterHandler
  • MoEmbed.App (別リポジトリ?)

    • モ氏が構成したアプリのインスタンス。
    • サービスで使用するハンドラーをコア部分に登録してWebアプリ起動

.NET CoreのコンフィギュレーションビルダーでIHandlerを構成させてもいいかも。
個人的な趣味としてはXAMLでDIだけど。

Add some test cases to TwitterMetadata

  • ExtendedMedia
  • AnimatedGIF https%3A%2F%2Ftwitter.com%2Ftomekitigai%2Fstatus%2F900277491583008768
  • Video https%3A%2F%2Ftwitter.com%2FTwitter%2Fstatus%2F560070183650213889

Twitter の oEmbed ベース実装で画像を取得する

oEmbed 実装 d5d5507 ではツイートのユーザー名・テキストが展開されるがプロフィール画像・コンテンツ画像が展開されないためこれを取得する。
Twitter oEmbed API 内でアクセスされている以下の API が JSON 形式で画像 URL を取得できるためこれを利用してみる。

https://cdn.syndication.twimg.com/tweet-result?id=463440424141459456

Twitter 固有実装部分は MoEmbed.Core/Models/Metadata/TwitterMetadata.cs っぽいがこの中で上記の API へのネットワークアクセスやっていいのか確認する。

SoundCloud 対応

SoundCloud など一部のサイトはHtmlプロパティによるプレイヤー埋め込みに対応せんといかんかな的

サーバーレス対応

ASP.NET Core support for native AOT | Microsoft Learn が来て NativeAOT 化できる場合、FaaS でサーバーレス運用できる可能性がある。維持費用はかなり安くなるはず。

ただしキャッシュやキューイングを考える必要がある。キャッシュは Redis で良いがキューイングが難しそう。少なくとも特定の IaaS に依存する設計にはしたくない

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.