Code Monkey home page Code Monkey logo

Comments (7)

adrianeboyd avatar adrianeboyd commented on May 20, 2024 2

So to get back to the original question, doc._.trf_data.last_hidden_layer_state is a Ragged object where you can use the spacy token index to access the tensor data for that token, without having to do any additional alignment on your side.

The data for each token is also a Ragged object:

import spacy
nlp = spacy.load("en_core_web_trf")
doc = nlp("DocTransformerOutput.last_hidden_layer_state is a Ragged object")

# for the tensors corresponding to "DocTransformerOutput.last_hidden_layer_state"
# (token index 0), you can access doc._.trf_data.last_hidden_layer_state[0].data
assert doc._.trf_data.last_hidden_layer_state[0].data.shape == (12, 768)

from spacy.

ahalterman avatar ahalterman commented on May 20, 2024 1

Great! That answers my question, and that's a very intuitive way to access the tensor by token index.

from spacy.

danieldk avatar danieldk commented on May 20, 2024

spaCy 3.7 switched to the Curated Transformers library. The DocTransformerOutput class is documented here:

https://spacy.io/api/curatedtransformer#doctransformeroutput

The last_hidden_layer_state property provides the per-token hidden representations for every document.

from spacy.

adrianeboyd avatar adrianeboyd commented on May 20, 2024

Ah, this probably should have been documented better as part of the release.

At first glance, the DocTransformerOutput seems to contain quite a bit less information than the spacy-transformers ModelOutput, in particular I don't see enough info to align the tensors with anything in the doc, but maybe I am mistaken?

from spacy.

adrianeboyd avatar adrianeboyd commented on May 20, 2024

Ah, the Ragged lengths align to spacy tokens? (I admit that I hadn't looked too closely at the details here before, which is part of why this was missed in the release notes.)

from spacy.

danieldk avatar danieldk commented on May 20, 2024

Ah, the Ragged lengths align to spacy tokens? (I admit that I hadn't looked too closely at the details here before, which is part of why this was missed in the release notes.)

Yeah, they do. spacy-curated-transformers applies piecing to tokens, so it doesn't have to do the same alignment as spacy-transformers (modulo whitespace tokens).

from spacy.

github-actions avatar github-actions commented on May 20, 2024

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

from spacy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.