Code Monkey home page Code Monkey logo

Comments (36)

SalvaEB avatar SalvaEB commented on July 21, 2024 4

There is a word cloud generator based on cairo which already produces vector output (svg and pdf). Although it is written in c++ and not in python, you could take a look just in case: https://github.com/SalvaEB/mask_word_cloud (disclaimer: I am the author. After looking for a word cloud generator that produced vector output, I tried to add this feature to word_cloud and I finally finished by writing a new code in c++ from scratch).

from word_cloud.

loydg avatar loydg commented on July 21, 2024 4

I managed to come up with a simple hack to the code that will print out a passable SVG version - I duplicated the Star Wars word cloud masked by the Stormtrooper helmet image.

https://loydg.github.io/svg_wordcloud/

from word_cloud.

amueller avatar amueller commented on July 21, 2024 1

yeah, the C++ code might be the best choice for now. I should potentially move this lib to cairo as well, or at least make that a potential backend.

from word_cloud.

amueller avatar amueller commented on July 21, 2024

The way PIL renders, there are no vector graphics currently. The best way would be to do a HTML export, I think, which is not currently there. There is some code at #23, but it is not really polished or tested.

from word_cloud.

amueller avatar amueller commented on July 21, 2024

Thinking about it, it might be easier to go via matplotlib. HTML export seems hard to do.

from word_cloud.

amueller avatar amueller commented on July 21, 2024

thanks for sharing @SalvaEB. I think there are many implementations that do support vector graphics. In particular I'm sure the d3 versions will do.

from word_cloud.

amueller avatar amueller commented on July 21, 2024

@SalvaEB you use a scanline approach, right? So the algorithm is slightly different but gives the same result? Is it any faster than mine?
And what do you dilate the image for?

from word_cloud.

SalvaEB avatar SalvaEB commented on July 21, 2024

I use a scanline instead of an integral matrix (I store the number of white pixels until the rightmost left pixel). I also use other tricks to update when writing a new word and others to avoid traversing the full image. I dilate the image in order to avoid a margin around the mask. In this way, the shape of the resulting word cloud seems more accurate. It seems faster but, since it is written in C++, I couldn't say which part is due to algorithmic differences. When using the the minimal font size higher than 4 the algorithm is much faster (and the result is less dense). Best regards, S.

from word_cloud.

amueller avatar amueller commented on July 21, 2024

Well, mine is written in Cython, which should be as fast basically. I think my current approach is a bit silly and I should probably change it. I'll check out yours.

from word_cloud.

amueller avatar amueller commented on July 21, 2024

What other tricks do you use?

from word_cloud.

SalvaEB avatar SalvaEB commented on July 21, 2024

reservoir sampling in order to avoid detecting the valid bounding boxes twice. You can check the code. The treatment of words and words sizes is very dummy for the moment. Unfortunately I have no much time now to complete it.

from word_cloud.

amueller avatar amueller commented on July 21, 2024

yeah I should also be working on other things ^^

from word_cloud.

AndreasS2501 avatar AndreasS2501 commented on July 21, 2024

Hi,

Just wanted to check on this. Is it now possible to also generate SVG?

from word_cloud.

amueller avatar amueller commented on July 21, 2024

I haven't worked on it yet. It should be possible to replace the pil commands by matplotlib and get SVG from there.

from word_cloud.

SalvaEB avatar SalvaEB commented on July 21, 2024

Dear AndreasS2501, have you tried https://github.com/SalvaEB/mask_word_cloud ? It is based in cairo and generates svg and pdf as well. I have not had time for the moment to preprocess the text input since I didn't that.

from word_cloud.

AndreasS2501 avatar AndreasS2501 commented on July 21, 2024

Hi SalvaEB,

thanks for the reply. I looked at your generated SVG example file. But I don't think it will suit my use case. I'd like to have the Words with fonts and fontsizes respectively. An Image gives you only pixels. But with and SVG document which is parsable XML I have the resulting meta-data (weighted ranking of words) in a visual and computable form. That would be nice, thats why I'm asking :)

from word_cloud.

SalvaEB avatar SalvaEB commented on July 21, 2024

Hi, I have not even realized that. I used a svg cairo surface which renders itself. The fact that text is not stored in svg is reported as an issue in cairo itself:
http://lists.cairographics.org/archives/cairo/2014-January/024961.html
https://bugs.freedesktop.org/show_bug.cgi?id=38516

As a wordaround, if you obtain a svg from the pdf (e.g. I have tried inkscape) the resulting svg file has the word tags instead of glyphs. I do not know if this svg will fulfill your requirements.
Best regards,
S.

from word_cloud.

AndreasS2501 avatar AndreasS2501 commented on July 21, 2024

Hi Salva,

IMO this is trivially possible with SVG here is a example of a SVG Wordcloud:

<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100%" height="100%" viewBox="0 0 800 500">
  <title>SVG Wordcloud</title>
<text x="0" y="15" font-family="Verdana" font-size="55" fill="blue" transform="rotate(90)">Even Rotating!</text>
<text x="30" y="30" font-family="Verdana" font-size="35" fill="green" >SVG Rocks</text>
<text x="150" y="55" font-family="Verdana" font-size="25" fill="red" >scalable possibilities</text>
<text x="250" y="75" font-family="Verdana" font-size="25" fill="cyan" >Works!</text> 
</svg>```

from word_cloud.

SalvaEB avatar SalvaEB commented on July 21, 2024

Hi Andreas,
It is not difficult to obtain such a SVG file directly from the computed coordinates either in amueller/word_cloud or in SalvaEB/mask_word_cloud (instead of using the cairo rendered, in this case). I can consider it an issue but, unfortunately, I do not have now the time to implement it.
Best regards,
Salva

from word_cloud.

amueller avatar amueller commented on July 21, 2024

The problem is that you need to use the same rendering engine, that is align the text in the same way.
I couldn't figure out how to render the information in HTML, as that uses the top left corner for alignment (PIL uses bottom left), and the height of the text depends on the hight of the letters. So it is somewhat non-trivial to go from one representation to the other. Probably still possible, though.

from word_cloud.

cranmer avatar cranmer commented on July 21, 2024

Was looking to see if your package supported svg. It would be wonderful if one could make something like an image map, so that you could attach some javascript action to a click on certain words. I was thinking of it as a search interface. Input many documents, create word frequencies, make svg word cloud, click on word then might execute a search on those documents.

from word_cloud.

amueller avatar amueller commented on July 21, 2024

Yeah I think I need to rewrite it with bokeh, but that won't happen in the next two month.

from word_cloud.

schrieveslaach avatar schrieveslaach commented on July 21, 2024

Any update on this?

I writing a latex document and I generate with the aid of pandoc and your python a wordcloud for my cover. It would be really great to be able to a vectorized version.

from word_cloud.

AndreasS2501 avatar AndreasS2501 commented on July 21, 2024

Sadly my Python skills are lacking, and refactoring, getting into a lib without types seems hard for me :/

from word_cloud.

SalvaEB avatar SalvaEB commented on July 21, 2024

I don't know if that might fit your needs but https://github.com/SalvaEB/mask_word_cloud produces svg and pdf as outputs. It is not based on python but is very easy to compile in linux (c++11, cairo graphics).

from word_cloud.

bafonso avatar bafonso commented on July 21, 2024

Like many others, I arrive at this ticket due to a need for vector output :-) I guess the options are expand @SalvaEB 's work to improve text processing or add cairo support through pycairo to this repository.

from word_cloud.

amueller avatar amueller commented on July 21, 2024

yeah, still open. @SalvaEB's work is totally usable btw, it's just a separate library.

from word_cloud.

cranmer avatar cranmer commented on July 21, 2024

pull request?

from word_cloud.

amueller avatar amueller commented on July 21, 2024

Yes, a PR would be great!

from word_cloud.

loydg avatar loydg commented on July 21, 2024

Does that mean I need to click a checkbox in the respository settings or something? I don't do programming/software development and only use my GitHub account as a login for https://observablehq.com, so while I understand the basic nature of GitHub - my grasp of it's exact functioning is otherwise rudimentary.

from word_cloud.

amueller avatar amueller commented on July 21, 2024

@loydg I tried playing with your code, but I found it hard to pass the font to the CSS. Do you have an idea about how to go from having a font path (which is what the word cloud has now) to the right CSS for the SVG?

from word_cloud.

amueller avatar amueller commented on July 21, 2024

initial PR in #163 btw, not sure if these were linked.

from word_cloud.

michaelsjackson avatar michaelsjackson commented on July 21, 2024

There is a word cloud generator based on cairo which already produces vector output (svg and pdf). Although it is written in c++ and not in python, you could take a look just in case: https://github.com/SalvaEB/mask_word_cloud (disclaimer: I am the author. After looking for a word cloud generator that produced vector output, I tried to add this feature to word_cloud and I finally finished by writing a new code in c++ from scratch).

Can your tool do all this python solution can as well? Plus the extra svg export, or does it cut other features?

from word_cloud.

amueller avatar amueller commented on July 21, 2024

done in #519.

from word_cloud.

michaelsjackson avatar michaelsjackson commented on July 21, 2024

done in #519.

But not as command line. 🥇

from word_cloud.

amueller avatar amueller commented on July 21, 2024

@michaelsjackson PR welcome!

from word_cloud.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.