Ideally we would not ever load any user code into the LS process. This makes it non-tr

I've whipped up a simple <a href="https://gist.github.com/ZacLN/111ffa8ee7cc28ac863de0

How to handle packages that are loaded from user code? about languageserver.jl HOT 17 CLOSED

julia-vscode commented on May 18, 2024

How to handle packages that are loaded from user code?

from languageserver.jl.

Comments (17)

pfitzseb commented on May 18, 2024 1

Just read through this issue, and I noticed that I've been doing something similar in DocSeeker, which supports inspecting all packages in Pkg.dir() and saving symbol information as well as docs to a cache on disk to enable decently fast searches through un-loaded packages.

The relevant part of the code (relevant for LSP, that is) isn't all that fleshed out, and especially doesn't have it's own process interacting with the cache as @davidanthoff suggests above, but seems to work at least okayish for now.

from languageserver.jl.

ZacLN commented on May 18, 2024

Option 1 is the way to go I think, static analysis is going to miss too much loop/eval'd code generation stuff.

Caching it all make sense and we can check for changes with whatever compilecache uses

from languageserver.jl.

davidanthoff commented on May 18, 2024

I guess one question is: how can we get the list of all methods defined in a given module. In particular, can we somehow get methods that actually extend a function in say base, that were defined in a module?

from languageserver.jl.

ZacLN commented on May 18, 2024

We can use the static analysis stuff and find all import statements, then filter the relevant functions' method tables by their .module

from languageserver.jl.

davidanthoff commented on May 18, 2024

Ah, cool! And actually, we might just see how performance looks like if we just look at all methods and check their .module, if that is not too slow we could avoid the whole static analysis, right?

from languageserver.jl.

ZacLN commented on May 18, 2024

I've whipped up a simple script that pulls hovers and goto defs from packages. It only takes ~ 5 seconds on my computer to pull everything from Base, this will be a bit longer when I add signatures but I don't think it's anything to worry about. Size for Base is around 3Mb so storage is no great ordeal

from languageserver.jl.

davidanthoff commented on May 18, 2024

Do you mean line 4 takes 5 secs on your system? That line takes 0.1 seconds on my system :) But yes, speed doesn't seem to be a huge issue here.

from languageserver.jl.

ZacLN commented on May 18, 2024

Well that's distressing!

from languageserver.jl.

ZacLN commented on May 18, 2024

Oh no I meant to pull all the documents from running cachemodule! I was worried my computer was busted somehow...

Anyway I've fiddled with this and have it working to pull Base/Core and all sub modules and to cache user installed packages.

Storage for Base/Core is ~ 30Mb and takes a minute or two to run initially. I haven't worked out a storage method yet. I've been burnt in the past trying to use HDF5/JLD to store data but maybe it's stabilised now?

My idea would be that on installation we pull all doc's from Base and store it.

On server startup this get's loaded into server.DocStore.
Each new Document has it's own DocStore::Dict which by default includes an entry pointing to server.Doctore[:Base].
Using/import of any packages results in: 1) loading to server.DocStore 2) saving to HDD 3) a new entry in document.DocStore pointing to the loaded package.
vscode menu option to reload all DocStores (i.e. following updated version of Julia / Pkg.update() ).

from languageserver.jl.

davidanthoff commented on May 18, 2024

I agree, if we can stay away from HDF5/JLD, it would be nice. I think both have a build script, so our whole package installation story would get more complicated... we might even be able to just use julia's default serializer? That file format is probably not stable, but on the other hand, it is just a cache, so that might not be much of an issue.

Could you share the link to the code that you mention? I'd like to get a bit of a feel for what it is doing.

from languageserver.jl.

ZacLN commented on May 18, 2024

I've been playing around with this here and there and the last thing that seemed good in terms of speed, and storage was this.

It writes some AST that holds a module Namespace whose sub modules are the relevant modules (i.e. Base, Core, LanguageServer) so it's easy to store (a .jl file) quite quick to construct, super quick to load and editing it simply requires parsing the file and selecting the modules to update or add. I haven't got it set up to load sub-modules yet but I think that'll be pretty easy.

The framework for what to actually store needs to be worked out so at the moment it just keeps methods and documentation (mainly to assess speed)

from languageserver.jl.

ZacLN commented on May 18, 2024

Alternatively there's this approach which seems slower

from languageserver.jl.

ZacLN commented on May 18, 2024

Scratch that, the second approach is better. You can simply serialize the resulting Dict as you proposed and store it. Can you thik of some existing code in Base that we could use to quickly search Dict keys by partial matching?

from languageserver.jl.

ZacLN commented on May 18, 2024

I've implemented the caching for Base and Core in this branch. It requires the forked JuliaParser (and includes all the other changes that are I think now ready for a merge).

It's easy to add new modules to the cache, I just need to work out the logic (as with includes) for which modules are loaded by any given file.

from languageserver.jl.

ZacLN commented on May 18, 2024

The above branch has been updated to asynchronously load and cache modules and parse the entire workspace on initialisation

from languageserver.jl.

davidanthoff commented on May 18, 2024

I've thought a bit more about this, and here is another idea:

We add one more julia process to the mix, namely a singleton symbol server. Essentially when a language server process starts, it will try to connect via a named pipe to this global singleton symbol server process. If that fails (because the symbol server has not been started), the language server process will start the symbol server and then connect. Essentially whenever there is at least one language server process running, there will be one global symbol server that serves all language server processes.

This symbol sever holds the cache of symbols. It also persists this cache on disc. When a language server needs symbol info for a specific package, it queries the global symbol server. If the cache has that info, the symbol server returns it from the cache, otherwise it will use similar logic as we had before to retrieve the symbol/doc info: the symbol server process will spawn a new julia process that loads the module and then returns all the extracted info to the symbol server, and there it gets stored in the cache (and also persisted onto the cache on disc). The symbol server will have one dedicated task for this, i.e. one task will work off a queue of packages that still need to be cached. That way we can prevent a large number of julia process from starting that each load one package for extraction of symbol/doc info (which happens right now and can get my beefy machine to stall).

So this is partly going back to the on disc cache that @ZacLN originally had, but the difference is that the only process that interacts with that on disc cache is the symbol server, so we get around the race condition that I was worried about.

Mainly I've come to terms that we need an on disc cache if we want decent performance and not long waits. Also, @ZacLN you mentioned that retrieving symbol/doc info takes even longer with triangular dispatch, right? So this would be even more important. This comment suggests that other language server implementations have used a similar pattern before.

Finally, this still allows us to completely isolate the language server and the symbol server from user code, i.e. only the julia processes that are spawned by the symbol server would actually run/import packages. I still think that that is super valuable in terms of robustness of the whole language server.

I have a prototype implementation of the singleton server story that works on Windows. I'll test it on Linux next, and then we could think about incorporating this.

Main question for @ZacLN is whether this would enable a scenario where you don't have to eval imports of packages in the language server process?

from languageserver.jl.

davidanthoff commented on May 18, 2024

Closing, as we now load this in a separate process.

from languageserver.jl.

How to handle packages that are loaded from user code? about languageserver.jl HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent