Code Monkey home page Code Monkey logo

Comments (2)

hanxiao avatar hanxiao commented on May 12, 2024

You can navigate DocumentArray via da['@c'] to use chunk information, docs here

The following example can be run as-is with pip install -U docarray and should answer your problem.

from typing import List

from docarray import Document, DocumentArray


def dummy_encode(sentences: List[str]):
    return [[1, 2, 3]] * len(sentences)


c1 = Document(text='hello')
c2 = Document(text='world')
d1 = Document(text='hello, world blah blah!', chunks=[c1, c2])
d1.display()

c3 = Document(text='hallo')
c4 = Document(text='welt')
d2 = Document(text='hallo, welt ja ja!', chunks=[c3, c4])
d2.display()

da = DocumentArray([d1, d2])
da.summary()

# embed on the "root" document
da.embeddings = dummy_encode(da.texts)
print(da.embeddings)

# let's reset the root
da.embeddings = None
# embed on the "chunk" document
da['@c'].embeddings = dummy_encode(da['@c'].texts)

print(da.embeddings)  # -> None, as we reset it, and we did not embed it again
print(da['@c'].embeddings)  # there we go!

# you can also access each
print(c1.embedding)
print(c2.embedding)
 <Document ('id', 'text', 'chunks') at 051c13a3cdb825644962bd214da539f8>
    └─ chunks
          ├─ <Document ('id', 'parent_id', 'granularity', 'text') at f7a9989942f5c00cc3680e846dbce361>
          └─ <Document ('id', 'parent_id', 'granularity', 'text') at ff102adb6538cce3020105ef8599b3ff>
 <Document ('id', 'text', 'chunks') at 7f255c2481e33d9ab9c7f2eef8052963>
    └─ chunks
          ├─ <Document ('id', 'parent_id', 'granularity', 'text') at d4bf332de54be178224ab2b8fb3a5d44>
          └─ <Document ('id', 'parent_id', 'granularity', 'text') at f6691a41caa4986420d2ac01238759d8>
                  Documents Summary                   
                                                      
  Length                    2                         
  Homogenous Documents      True                      
  Has nested Documents in   ('chunks',)               
  Common Attributes         ('id', 'text', 'chunks')  
                                                      
                        Attributes Summary                        
                                                                  
  Attribute   Data type         #Unique values   Has empty value  
 ──────────────────────────────────────────────────────────────── 
  chunks      ('ChunkArray',)   2                False            
  id          ('str',)          2                False            
  text        ('str',)          2                False            
                                                                  
[[1, 2, 3], [1, 2, 3]]
None
[[1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3]]
[1, 2, 3]
[1, 2, 3]

from docarray.

hanxiao avatar hanxiao commented on May 12, 2024

closed as no reply

from docarray.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.