elastic / enterprise-search-network-drive-connector Goto Github PK
View Code? Open in Web Editor NEWOfficial Enterprise Search | Workplace Search - Network Drives Connector
License: Other
Official Enterprise Search | Workplace Search - Network Drives Connector
License: Other
The Kibana UI of 8.3.1 has an incorrect link to:
https://github.com/elastic/enterprise-search-network-drive-connector
It should link to:
https://github.co
m/elastic/enterprise-search-network-drives-connector/
Testing this using a MacBook with Monterey 12.3.1 as a network drive, with SMB File Sharing enabled.
And a runtime environment with MacOS Monterey 12.4, Python 3.9.13.
I enabled logging of SMB.SMBConnection in the Network Drive client and see the following error:
INFO SMB.SMBConnection:base.py:98 Authentication with remote machine "MYSMB.FRITZ.BOX" for user "spacetime" will be using NTLM v2 authentication (with extended security)
INFO SMB.SMBConnection:base.py:124 Now switching over to SMB2 protocol communication
DEBUG SMB.SMBConnection:base.py:141 Received SMB2 message "SMB2_COM_NEGOTIATE" (command:0x0000 flags:0x0001)
INFO SMB.SMBConnection:base.py:248 SMB2 dialect negotiation successful
DEBUG SMB.SMBConnection:base.py:141 Received SMB2 message "SMB2_COM_SESSION_SETUP" (command:0x0001 flags:0x0001)
INFO SMB.SMBConnection:base.py:360 Performing NTLMv2 authentication (on SMB2) with server challenge "b'474eec6eedacc2be'"
INFO SMB.SMBConnection:base.py:363 Performing NTLMv2 authentication (on SMB2) with server challenge "b'474eec6eedacc2be'"
DEBUG SMB.SMBConnection:base.py:383 NT challenge response is "b'7bfc190cfe1527def5f7ed29de81659801010000000000000000000000000000ec4a27540fbaacb80000000001001200440041004e00410053002d004d004200500002001200440041004e00410053002d004d004200500003002600440061006e00610073002d004d00420050002e0066007200690074007a002e0062006f0078000400120066007200690074007a002e0062006f0078000700080000a705a12670d80100000000'" (168 bytes)
DEBUG SMB.SMBConnection:base.py:384 LM challenge response is "b'3a2143d16dcc1f6e29a48364471f49fdec4a27540fbaacb8'" (24 bytes)
INFO SMB.SMBConnection:base.py:390 Server requires all SMB messages to be signed
INFO SMB.SMBConnection:base.py:399 SMB signing activated. All SMB messages will be signed.
INFO SMB.SMBConnection:base.py:402 SMB signing key is b'364d753757413464474941544f304f59'
DEBUG SMB.SMBConnection:base.py:141 Received SMB2 message "SMB2_COM_SESSION_SETUP" (command:0x0001 flags:0x0001)
INFO SMB.SMBConnection:base.py:292 Authentication (on SMB2) failed. Please check username and password.
Username and password are correct and I'm able to connect to the drive via Finder and via command line using mount, so the issue in not on the drive side.
Already tested alternatives such as without direct TCP & port 139, and NTLM v1 but no success.
Seems to be a config issue others experienced too while using this library miketeo/pysmb#193.
Slack chat related to this: https://elastic.slack.com/archives/C02AV5C8ZDE/p1653404652854019
I have a local docker instance running to test workplace search with the following containers:
I get the following error when trying to perform a sync operation:
root@a4b8ddf1f9d3:/app# ees_network_drive -c network_drive_connector.yml full-sync
Indexing started at: 2023-10-05T08:39:32Z
Error while Fetching from the Network drive. Checkpoint not saved
Traceback (most recent call last):
File "/root/.local/bin/ees_network_drive", line 8, in <module>
sys.exit(main())
File "/root/.local/lib/python3.8/site-packages/ees_network_drive/cli.py", line 92, in main
run(args)
File "/root/.local/lib/python3.8/site-packages/ees_network_drive/cli.py", line 100, in run
commands[args.cmd](args).execute()
File "/root/.local/lib/python3.8/site-packages/ees_network_drive/full_sync_command.py", line 92, in execute
self.start_producer(queue, time_range)
File "/root/.local/lib/python3.8/site-packages/ees_network_drive/full_sync_command.py", line 63, in start_producer
raise exception
File "/root/.local/lib/python3.8/site-packages/ees_network_drive/full_sync_command.py", line 47, in start_producer
store = sync_network_drives.connect_and_get_all_folders()
File "/root/.local/lib/python3.8/site-packages/ees_network_drive/sync_network_drives.py", line 63, in connect_and_get_all_folders
path=os.path.join(*self.drive_path.parts[1:]),
TypeError: join() missing 1 required positional argument: 'a'
Passes all tests for ent-search and network drive share:
root@a4b8ddf1f9d3:/app# make test_connectivity
venv/bin/pytest ees_network_drive/test_connectivity.py
================================================================================================ test session starts =================================================================================================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.11.0, pluggy-1.3.0
rootdir: /app, configfile: pytest.ini
plugins: custom-exit-code-0.3.0, cov-3.0.0
collected 3 items
ees_network_drive/test_connectivity.py ... [100%]
================================================================================================= 3 passed in 0.46s ==================================================================================================
root@a4b8ddf1f9d3:/app#
Further more a manual connection test to the samba docker container is successful:
root@56d4e7a06b85:/app# smbclient -L //samba/
Password for [WORKGROUP\root]:
Sharename Type Comment
--------- ---- -------
share1 Disk
IPC$ IPC IPC Service (Docker Samba Server)
SMB1 disabled -- no workgroup available
Here is my network drive connector yml file:
#Configurations for the Network Drive Connector
# ------------------------------- Network Drive configuration settings -------------------------------
#The domain name of the Network Drive server for NTLM authentication
network_drive.domain: "WORKGROUP"
#The username used to login to Network Drive server
network_drive.username: "root"
#The password used to login to Network Drive server
network_drive.password: "bar"
#The relative path of the Network Drive.
network_drive.path: "share1"
# The name of the server hosting the Network Drive
network_drive.server_name: "Samba"
# The IP address of the server hosting the Network Drive
network_drive.server_ip: "samba"
#The name of the machine where the connector will run
client_machine.name: "network-drive-connector"
# ------------------------------- Workplace Search configuration settings -------------------------------
#Access token for Workplace search authentication
enterprise_search.api_key: "256781639e2785ac2b8c7be1005f56f0bc14cc99a2953d1b230a16041cf44a6a"
#Source identifier for the custom source created on the workplace search server
enterprise_search.source_id: "651be140a03b1a898b9598b9"
#Workplace search server address Example: http://es-host:3002
enterprise_search.host_url: "http://ent-search:3002/"
# ------------------------------- Connector specific configuration settings -------------------------------
#Specifies the objects to be fetched and indexed in the WorkPlace search along with fields that needs to be included/excluded. The list of the objects with a pattern to be included/excluded is provided. By default all the objects are fetched
include:
size:
path_template: ["**/*.txt", "**/*.contact", "**/*.docx", "**/*.json", "**/*.png", "**/*.jpg", "**/*.jpeg", "**/*.py", "**/*.yml", "**/*.md", "**/*.ini", "**/*.sh", "**/*.rst", "**/*.pdf", "**/*.rtf", "**/*.ppt", "**/*.file"]
exclude:
size: [">10000000"]
path_template:
#The timestamp after which all the objects that are modified or created are fetched from the Network Drive. By default, all the objects present in the Network Drive till the end_time are fetched
start_time :
#The timestamp before which all the updated objects need to be fetched i.e. the connector won't fetch any object updated/created after the end_time. By default, all the objects updated/added till the current time are fetched
end_time :
#The level of the logs the user wants to use in the log files. The possible values include: DEBUG, INFO, WARN, ERROR. By default, the level is INFO
log_level: INFO
#The number of retries to perform in case of server error. The connector will use exponential back-off for retry mechanism
retry_count: 3
#Number of threads to be used in multithreading for the Network Drive sync.
network_drives_sync_thread_count: 5
#Number of threads to be used in multithreading for the enterprise search sync.
enterprise_search_sync_thread_count: 5
#Denotes whether document permission will be enabled or not
enable_document_permission: Yes
#The path of csv file containing mapping of Network Drive user ID to Workplace user ID
network_drive_enterprise_search.user_mapping: ""
after looking into #25 and changing network_drive.path
to
network_drive.path: "samba/share1"
.. I get the following error:
root@a4b8ddf1f9d3:/app# ees_network_drive -c network_drive_connector.yml full-sync
Indexing started at: 2023-10-05T09:04:32Z
Unknown error while fetching files Failed to list share1 on samba: Unable to connect to shared device
==================== SMB Message 0 ====================
SMB Header:
-----------
Command: 0x03 (SMB2_COM_TREE_CONNECT)
Status: 0x00000000
Flags: 0x00
PID: 76
MID: 3
TID: 0
Data: 34 bytes
b'0900000048001a005c005c00530041004d00420041005c00730061006d0062006100'
SMB Data Packet (hex):
----------------------
b'fe534d42400000000000000003000000000000000000000003000000000000004c00000000000000546e357100000000000000000000000000000000000000000900000048001a005c005c00530041004d00420041005c00730061006d0062006100'
==================== SMB Message 1 ====================
SMB Header:
-----------
Command: 0x03 (SMB2_COM_TREE_CONNECT)
Status: 0xC00000CC
Flags: 0x01
PID: 76
MID: 3
TID: 0
Data: 9 bytes
b'090000000000000000'
SMB Data Packet (hex):
----------------------
b'fe534d4240000000cc0000c003000100010000000000000003000000000000004c00000000000000546e35710000000000000000000000000000000000000000090000000000000000'
Traceback (most recent call last):
File "/root/.local/lib/python3.8/site-packages/ees_network_drive/files.py", line 93, in recursive_fetch
file_list = smb_connection.listPath(service_name, rf'{path}', search=16)
File "/root/.local/lib/python3.8/site-packages/smb/SMBConnection.py", line 210, in listPath
self._pollForNetBIOSPacket(timeout)
File "/root/.local/lib/python3.8/site-packages/smb/SMBConnection.py", line 649, in _pollForNetBIOSPacket
self.feedData(data)
File "/root/.local/lib/python3.8/site-packages/nmb/base.py", line 54, in feedData
self._processNMBSessionPacket(self.data_nmb)
File "/root/.local/lib/python3.8/site-packages/nmb/base.py", line 75, in _processNMBSessionPacket
self.onNMBSessionMessage(packet.flags, packet.data)
File "/root/.local/lib/python3.8/site-packages/smb/base.py", line 150, in onNMBSessionMessage
if self._updateState(self.smb_message):
File "/root/.local/lib/python3.8/site-packages/smb/base.py", line 344, in _updateState_SMB2
req.callback(message, **req.kwargs)
File "/root/.local/lib/python3.8/site-packages/smb/base.py", line 736, in connectCB
errback(OperationFailure('Failed to list %s on %s: Unable to connect to shared device' % ( path, service_name ), messages_history))
File "/root/.local/lib/python3.8/site-packages/smb/SMBConnection.py", line 204, in eb
raise failure
smb.smb_structs.OperationFailure: Failed to list share1 on samba: Unable to connect to shared device
==================== SMB Message 0 ====================
SMB Header:
-----------
Command: 0x03 (SMB2_COM_TREE_CONNECT)
Status: 0x00000000
Flags: 0x00
PID: 76
MID: 3
TID: 0
Data: 34 bytes
b'0900000048001a005c005c00530041004d00420041005c00730061006d0062006100'
SMB Data Packet (hex):
----------------------
b'fe534d42400000000000000003000000000000000000000003000000000000004c00000000000000546e357100000000000000000000000000000000000000000900000048001a005c005c00530041004d00420041005c00730061006d0062006100'
==================== SMB Message 1 ====================
SMB Header:
-----------
Command: 0x03 (SMB2_COM_TREE_CONNECT)
Status: 0xC00000CC
Flags: 0x01
PID: 76
MID: 3
TID: 0
Data: 9 bytes
b'090000000000000000'
SMB Data Packet (hex):
----------------------
b'fe534d4240000000cc0000c003000100010000000000000003000000000000004c00000000000000546e35710000000000000000000000000000000000000000090000000000000000'
Found an end signal in the queue. Closing Thread ID 281473285788128
Thread ID: 281473285788128 Total 0 documents indexed out of: 0 till now..
Found an end signal in the queue. Closing Thread ID 281473285788128
Thread ID: 281473285788128 Total 0 documents indexed out of: 0 till now..
Found an end signal in the queue. Closing Thread ID 281473285788128
Thread ID: 281473285788128 Total 0 documents indexed out of: 0 till now..
Found an end signal in the queue. Closing Thread ID 281473285788128
Thread ID: 281473285788128 Total 0 documents indexed out of: 0 till now..
Found an end signal in the queue. Closing Thread ID 281473285788128
Thread ID: 281473285788128 Total 0 documents indexed out of: 0 till now..
Successfully saved the checkpoint
Indexing ended at: 2023-10-05T09:04:32Z
root@a4b8ddf1f9d3:/app#
I get this error when network_drive_connector.yml has network_drive.path: "public".
"sync_network_drives.py", line 63, in connect_and_get_all_folders
path=os.path.join(*self.drive_path.parts[1:]),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: join() missing 1 required positional argument: 'path'
It seems that network_drive.path must be in form of "public/dummy" etc, and cannot be "public".
Is it an expected behaviour?
Add the network drive connector to : https://github.com/elastic/connectors-python
ℹ️ This connector package requires a compatible Elastic subscription level. Refer to the Elastic subscriptions pages for Elastic Cloud and self-managed deployments.
We have a need for this in our organisation for millions of documents stored on a shared drive so would like to know current estimates timelines for this.
If timelines are not suitable are there any other commercial and or open source connectors which are compatible which we could look into?
Will be helpful to document the type of network drive paths we support, e.g. will it work against any network drives as long as the network paths are using local path, UNC, SMB, Windows mapped drivers, etc..
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.