Code Monkey home page Code Monkey logo

Comments (7)

honnibal avatar honnibal commented on April 27, 2024

This should be fixed in version 0.63. Are you on that version yet? I ran your code on 0.63 and it works, but I can reproduce your error on the previous version.

Here's what's going on.

Before the latest version, Token was supposed to be a weak reference. The underlying data is a C array of structs. This data is owned by the Tokens object. Once the Tokens object passes out of scope, the underlying C data is due for collection. If you're still holding references to Token objects, and you access a property that's proxied to that C data, all bets are off.

The recent update tries to check whether any Token objects will outlive the Tokens object. If so, it gives them a copy of the C data. I didn't document any of this properly, and I'm sorry that this seems to have wasted some of your time. I hope to have less broken release processes soon.

But, aside from this: what you're doing is supposed to be a rare edge-case. My intention is for users to maintain a reference to the Tokens object, and access the Token objects through it. Is there a reason you want to create your own list? It might indicate a weakness in my API.

from spacy.

NSchrading avatar NSchrading commented on April 27, 2024

I am running this on version 0.63, sorry I didn't mention that. Strange that it works on your machine, but not mine. But now that I understand you have to keep a reference to the Tokens object, I can do what I wanted to earlier without the error. It might be good to look into this a bit more to make sure you really are preventing those tokens from losing the necessary data; I'm positive I'm on 0.63 (I even checked tokens.pxd and it has this change: "cdef int take_ownership_of_c_data(self) except -1"

The reason I was doing that was I have a large set of tweets, each of which I was turning into a Tokens object by doing nlp(tweet_text). I didn't know you needed to keep a reference to it so I was just passing it into a function that goes and does the analysis I want. In this case I was using the dependency parse to extract Subject, Verb, Object structures. In that function, I was finding those subject, verb, and object tokens and was trying to append them to another list that I return at the end. So down the line I wanted to examine the tokens that were returned. I figured it might be useful to have all of the information, like tok.pos_, tok.dep_, etc, which is why I was adding the token to a list. To work around it, I simply appended the tok.lower_ string representation. Which so far is fine, but maybe I'd want more info later.

Don't worry about the bugs, I understand this is the early stage of development. I do want to commend you on your work so far, though. The dependency parsing works pretty darn well, even on informal text from the internet. And it is by far the fastest and easiest to use in Python. I've tried TurboParser and TweeboParser and both of those are very difficult to work with in Python. It's so simple to traverse the tree using rights and lefts as well. Really well done!

from spacy.

honnibal avatar honnibal commented on April 27, 2024

Working on this. I think I have it replicating on my server, but not on my laptop. Memory errors like this are difficult, because tests can pass accidentally, depending on whether the memory was over-written.

from spacy.

honnibal avatar honnibal commented on April 27, 2024

Okay, try v0.65. This is working on my laptop, server, and on Travis.

Please watch out for, and report, memory leaks in the new implementation. There's a reference cycle between Tokens and Token. I've run the parser over lots of documents, but the problems might only arise when the Token objects are used in some non-trivial way.

from spacy.

NSchrading avatar NSchrading commented on April 27, 2024

It seems to be working on my end now with v0.65!

from spacy.

honnibal avatar honnibal commented on April 27, 2024

Great!

from spacy.

lock avatar lock commented on April 27, 2024

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

from spacy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.