Code Monkey home page Code Monkey logo

Comments (12)

mrobinson avatar mrobinson commented on June 11, 2024 1

Thanks for clarifying, it makes mostly sense to me so far: the three-tier layered structure seems to me make sense, although the need for one 'LayoutFontStore' per layout and specific for web fonts is not completely clear yet, but I guess it will be. Is it because the lifecycle of web fonts is different and to facilitate their unloading?

The reason that we need a LayoutFontStore per-layout is that a stylesheet can specify a web font with a URL, family name, and font properties. Even though the URL can be the same as the web fonts on other pages, for the purposes of matching the family name and font properties only apply to the page where the web font was included in the stylesheet. So even though a web font is shared each "instance" of the font is specific to a layout for matching.

from servo.

mrobinson avatar mrobinson commented on June 11, 2024 1

Ah, there's another issue that I wanted to bring up related to sandboxing as well. It seems impossible to send system font data over IPC or shared memory on macOS, because of the way the CoreText APIs work. Basically system TTC files need to be loaded from disk by CoreText for it to work properly. Due to this, we'll need to continue loading system fonts using CoreText, poking a hole in the sandbox for them. This is what Chromium does and I suspect we'll need to do the same thing.

from servo.

nicoburns avatar nicoburns commented on June 11, 2024 1

I've looked into this a little more and I have some comments:

  • It would be good if the layout_2020 crate could be decoupled from the font store types. And especially from the font thread. Perhaps this could be abstracted behind a trait such that other people wishing to use the layout crate could provide their own font loading logic. Bundling shaping with layout seems reasonable, but the font loading seems like ought to be separable.

  • I had thought that Servo was using servo/font-kit for font loading, but I see that font-kit is only used for <canvas> and that this functionality is provided by components/gfx. In fact components/gfx is almost text and font related code. Do you know why this is?

  • The font loading and "store" code in gfx seems to have a lot of overlap with linebender's new fontique crate. I wonder if Servo would consider adopting that library? I feel like if we can get a unified "font database" abstraction across that the Rust ecosystem then that would be a really good basis for interoperability. My understanding is that most (in memory) "font databases" are primarily storing the raw bytes of the font file (+ some metadata), so it seems like it ought to be possible to standardise.

from servo.

gterzian avatar gterzian commented on June 11, 2024

I think it's an interesting idea and I have some questions, mostly about how this would play out when running in multiprocess mode.

The current proposal does not define how it would be impacted by process boundaries. The current font cache is found alongside the constellation in the "main process" of Servo(what other engines refer to as the chrome process I think), whereas the users of the font cache would be partitioned by a BrowsingContextGroup as individual content-processes. This means we need to think both of what to share globally across Servo, by way of IPC and a central mechanism found in the main process, and what we can then hierarchically share with child content processes.

I can imagine a hierarchical structure, where we keep the core font mechanism inside the "main browser process", alongside the constellation, as well as the resource loading mechanism it uses, but perhaps supplement it with local per-content-process caches which would follow something alongside the proposed design and allow for easy sharing between threads within a single content-process, but I wonder how many threads would benefit from this, after the removal of individual threads for layout(with #31346).

Lastly, I wonder if we should not rather partition further the current font cache, for security or privacy, for example per , meaning we also need to think about when not to share data across boundaries, such as only on a "per BC group"(Something like this will have to eventually happen for the HTTP cache, see whatwg/fetch#904).

from servo.

mrobinson avatar mrobinson commented on June 11, 2024

@gterzian This is a great point. The other consideration for the multiprocess case is sandboxing. I think the solution here is to have the FontCacheThread in the UI-process responsible for loading font lists and also font data. Still this would be handled as described in the design above (with the exception that the multiprocess case would proxy these requests via IPC to the main process). Likely all fonts would need to be created using byte buffers as well. I think this could mostly be transparent to most of the code.

from servo.

nicoburns avatar nicoburns commented on June 11, 2024

In general, Font does not change very much either. The biggest change here is that FontHandle will become both Sync and Send and FontRef will become Arc<RefCell<Font>>

Perhaps Arc<AtomicRefCell<Font>>, Arc<Mutex<Font>> or Arc<RwLock<Font>>? I don't think Arc<RefCell<_>> ever makes sense.

from servo.

mrobinson avatar mrobinson commented on June 11, 2024

Perhaps Arc<AtomicRefCell<Font>>, Arc<Mutex<Font>> or Arc<RwLock<Font>>? I don't think Arc<RefCell<_>> ever makes sense.

Yep, I meant Arc<RwLock<Font>> here. I've corrected the text above.

from servo.

gterzian avatar gterzian commented on June 11, 2024

Couple of more questions, I have looked further into the existing system, so I hope my questions are now more specific:

I think the solution here is to have the FontCacheThread in the UI-process responsible for loading font lists and also font data

With "the UI-process" do you mean the content-process(where script and layout runs)? If so that would mean running multiple font threads, one per content-process, which I think would defeat the purpose of increasing sharing. I would rather keep one central FontCache in the chrome process, and do IPC with per-content-process caches for all local layouts(the proposed FontStore).

Edit: I've just become aware the FontCacheThread is just a wrapper to a channel to the FontCache, so whenever I wrote FontCacheThread I actually meant "the thing that runs in a thread", which is the FontCache.

Regarding the specific problems that the re-design is meant to address:

  • Web fonts are global. Is this not something we can address with the desired partioning at the level of the FontCache?
  • Font data is copied to each layout thread: Agreed on the need to share data between threads(the per-layout FontStore mentioned above), iff those threads would run in the same content-process. Also a question: since we don't have individual layout threads anymore, what threads are we talking about here? Would this per-layout FontStore not rather be some thread-local data structure shared by the various layouts?
  • Font data is loaded more than once for fonts that share files: can this be handled locally inside the current design? it appears to me like a question of switching the current pub data: Option<Arc<Vec<u8>>> into a pub data: Arc<HashMap<FontIdentifier, Option<Vec<u8>>>>, so that the optional data can be shared between multiple templates. It also appears to me this is not currently shared between threads, and per above comment I don't think it should, and so could be an Rc.
  • Font data for web fonts is never unloaded: can this be handled with a "unload" messaging flow that fits in the current design? Perhaps in addition to some partitioning as per the first problem. For example, what is described as the "per-layout FontStore" could, when dropping or otherwise become unnecessary, send a message to the global FontCache for potential clean-up of the web font.

Finally, is the proposed FontStore, but the way I think it could be(so probably not actually shared between non-existing layout threads) not just the current FontContext?

I do see a problem with the current FontContext, especially if we don't run layout in threads anymore since this means it would block script as well, and that is the blocking IPC(the initial one that would miss the cache in the context) with the FontCache. Is there a way we can make those asynchronous using the IPC router and some appropriate state machine approach? It seems to me that the font context is used synchronously in the layout algorithm, so I guess the answer is "not easily", but perhaps something could be done via the script-thread as part of document load to populate the context for layout? Or perhaps layout should run without fonts, or with some sort of default one that would always be available, while they are being loaded?

from servo.

mrobinson avatar mrobinson commented on June 11, 2024

With "the UI-process" do you mean the content-process(where script and layout runs)? If so that would mean running multiple font threads, one per content-process, which I think would defeat the purpose of increasing sharing. I would rather keep one central FontCache in the chrome process, and do IPC with per-content-process caches for all local layouts(the proposed FontStore).

The UI process is another name for the chrome process. "Chrome process" is Chrome/Chromium parlance while "UI process" is the term that WebKit uses. I think we ultimately need three tiers of data structure for fonts:

  • FontService: This lives in the UI process and would store font information global to all Servo processes. It would mainly hold system font lists and any data loaded for system fonts.
  • FontStore: This lives in the content process and holds cached system FontTemplates, Fonts, and cached font data which can be shared by all layouts.
  • LayoutFontStore: This lives in each layout and stores FontTemplates for web fonts. Web fonts templates and data can be shared between different LayoutFontStores and when the last reference is released they are automatically cleaned up.
* **Web fonts are global.** Is this not something we can address with the desired partioning at the level of the `FontCache`?

I sort of addressed this above.

* **Font data is copied to each layout thread**: Agreed on the need to share data between threads(the per-layout `FontStore` mentioned above), iff those threads would run in the same content-process. Also a question: since we don't have individual layout threads anymore, what threads are we talking about here? Would this per-layout `FontStore` not rather be some thread-local data structure shared by the various layouts?

We don't have a layout thread for each layout, but layout does create worker threads and currently there is a FontStore for every worker. This work is about eliminating this font template and data duplication across worker threads, which is likely very expensive. Currently every worker talks directly to the global font cache!

* **Font data is loaded more than once for fonts that share files**: can this be handled locally inside the current design? it appears to me like a question of switching the current [`pub data: Option<Arc<Vec<u8>>>`](https://github.com/servo/servo/blob/21ea6d21f0f3bcb2e736082e397ef99cf9ecd051/components/gfx/font_template.rs#L99) into a `pub data: Arc<HashMap<FontIdentifier, Option<Vec<u8>>>>`, so that the optional data can be shared between multiple templates. It also appears to me this is not currently shared between threads, and per above comment I don't think it should, and so could be an `Rc`.

It might be possible to handle it in the current design, but I think we should rethink things to unpack as much of our technical debt as possible. In my proposal this means a data structure that just concerns itself with caching font data. I think that will simplify a lot of things, because then all the other font data structures just have to care about holding Arc and the font data cache can clean up automatically using weak references.

* **Font data for web fonts is never unloaded**: can this be handled with a "unload" messaging flow that fits in the current design? Perhaps in addition to some partitioning as per the first problem. For example, what is described as the "per-layout FontStore" could, when dropping or otherwise become unnecessary, send a message to the global `FontCache` for potential clean-up of the web font.

I think this can be even simpler if we have a per-layout FontStore because cleaning up the LayoutFontStore automatically cleans up all web fonts. Eventually this will need to be better though, in order to handle font unloading for long-running single page applications that load and unload fonts from stylesheets.

Finally, is the proposed FontStore, but the way I think it could be(so probably not actually shared between non-existing layout threads) not just the current FontContext?

The big difference is that there is oneFontContext for every layout worker thread and LayoutFontStore is per-layout.

I do see a problem with the current FontContext, especially if we don't run layout in threads anymore since this means it would block script as well, and that is the blocking IPC(the initial one that would miss the cache in the context) with the FontCache. Is there a way we can make those asynchronous using the IPC router and some appropriate state machine approach? It seems to me that the font context is used synchronously in the layout algorithm, so I guess the answer is "not easily", but perhaps something could be done via the script-thread as part of document load to populate the context for layout? Or perhaps layout should run without fonts, or with some sort of default one that would always be available, while they are being loaded?

This is an interesting question. Regarding your point about blocking script -- this is already the case and even was when we had the layout thread, because script always blocked on the layout thread layout. There was never any parallelism between script and layout (apart from some really minor things). I think we can never do a layout until we have at the very least loaded the system font list and loaded fonts necessary for a layout (even if falling back from in-process web fonts). Doing a layout before this happens will lead to flashes of badly laid out pages. It's better to show nothing than do that, I think.

That said, once those fonts are loaded once, they should probably never be synchronous calls to use them ever again (unless the system fonts change while Servo is running). Maybe we can look into pre-populating the per-content-process FontStore when creating them. I think that would make a lot of sense actually -- though if the new page uses unloaded fonts they would obviously need to be loaded.

from servo.

gterzian avatar gterzian commented on June 11, 2024

Thanks for clarifying, it makes mostly sense to me so far: the three-tier layered structure seems to me make sense, although the need for one 'LayoutFontStore' per layout and specific for web fonts is not completely clear yet, but I guess it will be. Is it because the lifecycle of web fonts is different and to facilitate their unloading?

from servo.

mrobinson avatar mrobinson commented on June 11, 2024
  • I had thought that Servo was using servo/font-kit for font loading, but I see that font-kit is only used for <canvas> and that this functionality is provided by components/gfx. In fact components/gfx is almost text and font related code. Do you know why this is?

font-kit was created well after the font backend and it was never ported. In the meantime, font-kit is mostly abandoned and is mainly just a port of the old font code form Servo. It's not clear if it's going to be complete enough for Servo going forward. There is a dependency on font-kit in canvas, because raqote uses it, but I would be happy to drop the dependency entirely. I think it makes sense to bring the font system up to date with modern fonts and modern Servo and then see what the situation looks like.

  • The font loading and "store" code in gfx seems to have a lot of overlap with linebender's new fontique crate. I wonder if Servo would consider adopting that library? I feel like if we can get a unified "font database" abstraction across that the Rust ecosystem then that would be a really good basis for interoperability. My understanding is that most (in memory) "font databases" are primarily storing the raw bytes of the font file (+ some metadata), so it seems like it ought to be possible to standardise.

fontique looks cool, but font enumeration is one of the simplest things that the font backend does right now. In addition, fallback is defined in the specification so we can't just do what the system requests. The enumeration code in fontique honestly looks a lot like ours (is it a port?) and lacks some of the features we have added in recent weeks. Generally speaking the requirements of a web browser go well beyond that of a normal application, so I do not have much faith in high-level libraries for fonts. We will be using things like skrifa very soon though.

Regarding font handles, Servo's requirements are very tricky due to sandboxing and IPC. I think there's no hope of using something like fontique for that.

from servo.

mrobinson avatar mrobinson commented on June 11, 2024

It would be good if the layout_2020 crate could be decoupled from the font store types. And especially from the font thread. Perhaps this could be abstracted behind a trait such that other people wishing to use the layout crate could provide their own font loading logic. Bundling shaping with layout seems reasonable, but the font loading seems like ought to be separable.

This would be nice to do, but fonts and web layout are very intertwined. The first priority should be that things work correctly. The details of the FontCache in Servo are already behind a FontSource trait and we do not plan on making anything less abstract.

from servo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.