Comments (1)
Trying to construct a text map using atlas, it can work when I sample a small number of data. But when I use all the data, it got errors:
--------------------------------------------------------------------------- UnicodeEncodeError Traceback (most recent call last) Cell In[8], line 17 13 print(data[0]) 15 max_documents = 500000 ---> 17 project = atlas.map_text(data=data, 18 indexed_field='dialogue', 19 name='UltraChat', 20 id_field='id', 21 description='Large-scale, high-quality, and diverse muli-round dialogue data.', 22 ) File ~/opt/anaconda3/envs/dn/lib/python3.8/site-packages/nomic/atlas.py:226, in map_text(data, indexed_field, id_field, name, description, build_topic_model, multilingual, is_public, colorable_fields, num_workers, organization_name, reset_project_if_exists, add_datums_if_exists, shard_size, projection_n_neighbors, projection_epochs, projection_spread) 224 logger.info(f"{project.name}: Deleting project due to failure in initial upload.") 225 project.delete() --> 226 raise e 228 logger.info("Text upload succeeded.") 230 # make a new index if there were no datums in the project before File ~/opt/anaconda3/envs/dn/lib/python3.8/site-packages/nomic/atlas.py:218, in map_text(data, indexed_field, id_field, name, description, build_topic_model, multilingual, is_public, colorable_fields, num_workers, organization_name, reset_project_if_exists, add_datums_if_exists, shard_size, projection_n_neighbors, projection_epochs, projection_spread) 216 logger.warning("Passing 'num_workers' is deprecated and will be removed in a future release.") 217 try: --> 218 project.add_text( 219 data, 220 shard_size=None, 221 ) 222 except BaseException as e: 223 if number_of_datums_before_upload == 0: File ~/opt/anaconda3/envs/dn/lib/python3.8/site-packages/nomic/project.py:1387, in AtlasProject.add_text(self, data, pbar, shard_size, num_workers) 1385 data = pa.Table.from_pandas(data) 1386 elif isinstance(data, list): -> 1387 data = pa.Table.from_pylist(data) 1388 elif not isinstance(data, pa.Table): 1389 raise ValueError("Data must be a pandas DataFrame, list of dictionaries, or a pyarrow Table.") File ~/opt/anaconda3/envs/dn/lib/python3.8/site-packages/pyarrow/table.pxi:3705, in pyarrow.lib.Table.from_pylist() File ~/opt/anaconda3/envs/dn/lib/python3.8/site-packages/pyarrow/table.pxi:5226, in pyarrow.lib._from_pylist() File ~/opt/anaconda3/envs/dn/lib/python3.8/site-packages/pyarrow/table.pxi:3580, in pyarrow.lib.Table.from_arrays() File ~/opt/anaconda3/envs/dn/lib/python3.8/site-packages/pyarrow/table.pxi:1391, in pyarrow.lib._sanitize_arrays() File ~/opt/anaconda3/envs/dn/lib/python3.8/site-packages/pyarrow/table.pxi:1372, in pyarrow.lib._schema_from_arrays() File ~/opt/anaconda3/envs/dn/lib/python3.8/site-packages/pyarrow/array.pxi:317, in pyarrow.lib.array() File ~/opt/anaconda3/envs/dn/lib/python3.8/site-packages/pyarrow/array.pxi:39, in pyarrow.lib._sequence_to_array() File ~/opt/anaconda3/envs/dn/lib/python3.8/site-packages/pyarrow/error.pxi:144, in pyarrow.lib.pyarrow_internal_check_status() UnicodeEncodeError: 'utf-8' codec can't encode characters in position 2350-2351: surrogates not allowed
I have removed all the non-ASCII data but still got the error, any ideas?
FIxed. There still are some weird characters..
from nomic.
Related Issues (20)
- do you plan to provide qdrant vector database sample
- AttributeError: 'AtlasMapEmbeddings' object has no attribute 'atlas_api_path' HOT 1
- Failure when two columns have same name but in upper- and lower-case HOT 1
- Error occurs when downloading tiles.
- Error uploading text
- Your authorization token is no longer valid. Run `nomic login` to obtain a new one. HOT 1
- Improve state handling on deletion
- Demo colab error: AttributeError: 'AtlasProjection' object has no attribute 'vector_search' HOT 1
- `update_maps` shard_size is deprecated
- Problems with browser client features HOT 3
- Allow Adding Arrow Tables/Batches Directly
- topics / labels HOT 12
- Now able to find `detect_duplicate` in the documentation or as code HOT 2
- Drop indices on pandas dataframes HOT 4
- Client side data validation for colorable fields
- List projects under my account HOT 2
- add tooltips on https://atlas.nomic.ai/map/* HOT 1
- create your own folder called "nomic"
- Whatβs the IP address when enable API server?
- Different Topic option using api HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nomic.