Code Monkey home page Code Monkey logo

enterprise-search-network-drive-connector's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

pezzking

enterprise-search-network-drive-connector's Issues

Can't connect SMB drive on MacOS

Testing this using a MacBook with Monterey 12.3.1 as a network drive, with SMB File Sharing enabled.
And a runtime environment with MacOS Monterey 12.4, Python 3.9.13.

I enabled logging of SMB.SMBConnection in the Network Drive client and see the following error:

INFO SMB.SMBConnection:base.py:98 Authentication with remote machine "MYSMB.FRITZ.BOX" for user "spacetime" will be using NTLM v2 authentication (with extended security)
INFO SMB.SMBConnection:base.py:124 Now switching over to SMB2 protocol communication
DEBUG SMB.SMBConnection:base.py:141 Received SMB2 message "SMB2_COM_NEGOTIATE" (command:0x0000 flags:0x0001)
INFO SMB.SMBConnection:base.py:248 SMB2 dialect negotiation successful
DEBUG SMB.SMBConnection:base.py:141 Received SMB2 message "SMB2_COM_SESSION_SETUP" (command:0x0001 flags:0x0001)
INFO SMB.SMBConnection:base.py:360 Performing NTLMv2 authentication (on SMB2) with server challenge "b'474eec6eedacc2be'"
INFO SMB.SMBConnection:base.py:363 Performing NTLMv2 authentication (on SMB2) with server challenge "b'474eec6eedacc2be'"
DEBUG SMB.SMBConnection:base.py:383 NT challenge response is "b'7bfc190cfe1527def5f7ed29de81659801010000000000000000000000000000ec4a27540fbaacb80000000001001200440041004e00410053002d004d004200500002001200440041004e00410053002d004d004200500003002600440061006e00610073002d004d00420050002e0066007200690074007a002e0062006f0078000400120066007200690074007a002e0062006f0078000700080000a705a12670d80100000000'" (168 bytes)
DEBUG SMB.SMBConnection:base.py:384 LM challenge response is "b'3a2143d16dcc1f6e29a48364471f49fdec4a27540fbaacb8'" (24 bytes)
INFO SMB.SMBConnection:base.py:390 Server requires all SMB messages to be signed
INFO SMB.SMBConnection:base.py:399 SMB signing activated. All SMB messages will be signed.
INFO SMB.SMBConnection:base.py:402 SMB signing key is b'364d753757413464474941544f304f59'
DEBUG SMB.SMBConnection:base.py:141 Received SMB2 message "SMB2_COM_SESSION_SETUP" (command:0x0001 flags:0x0001)
INFO SMB.SMBConnection:base.py:292 Authentication (on SMB2) failed. Please check username and password.

Username and password are correct and I'm able to connect to the drive via Finder and via command line using mount, so the issue in not on the drive side.

Already tested alternatives such as without direct TCP & port 139, and NTLM v1 but no success.
Seems to be a config issue others experienced too while using this library miketeo/pysmb#193.

Slack chat related to this: https://elastic.slack.com/archives/C02AV5C8ZDE/p1653404652854019

Error while Fetching from the Network drive. Checkpoint not saved

I have a local docker instance running to test workplace search with the following containers:

  • elasticsearch
  • ent-search
  • kibana
  • samba - (network share)

I get the following error when trying to perform a sync operation:

root@a4b8ddf1f9d3:/app# ees_network_drive -c network_drive_connector.yml full-sync
Indexing started at: 2023-10-05T08:39:32Z
Error while Fetching from the Network drive. Checkpoint not saved
Traceback (most recent call last):
  File "/root/.local/bin/ees_network_drive", line 8, in <module>
    sys.exit(main())
  File "/root/.local/lib/python3.8/site-packages/ees_network_drive/cli.py", line 92, in main
    run(args)
  File "/root/.local/lib/python3.8/site-packages/ees_network_drive/cli.py", line 100, in run
    commands[args.cmd](args).execute()
  File "/root/.local/lib/python3.8/site-packages/ees_network_drive/full_sync_command.py", line 92, in execute
    self.start_producer(queue, time_range)
  File "/root/.local/lib/python3.8/site-packages/ees_network_drive/full_sync_command.py", line 63, in start_producer
    raise exception
  File "/root/.local/lib/python3.8/site-packages/ees_network_drive/full_sync_command.py", line 47, in start_producer
    store = sync_network_drives.connect_and_get_all_folders()
  File "/root/.local/lib/python3.8/site-packages/ees_network_drive/sync_network_drives.py", line 63, in connect_and_get_all_folders
    path=os.path.join(*self.drive_path.parts[1:]),
TypeError: join() missing 1 required positional argument: 'a'

Toubleshooting steps I carried out

Passes all tests for ent-search and network drive share:

root@a4b8ddf1f9d3:/app# make test_connectivity
venv/bin/pytest ees_network_drive/test_connectivity.py
================================================================================================ test session starts =================================================================================================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.11.0, pluggy-1.3.0
rootdir: /app, configfile: pytest.ini
plugins: custom-exit-code-0.3.0, cov-3.0.0
collected 3 items

ees_network_drive/test_connectivity.py ...                                                                                                                                                                     [100%]

================================================================================================= 3 passed in 0.46s ==================================================================================================
root@a4b8ddf1f9d3:/app#

Further more a manual connection test to the samba docker container is successful:

root@56d4e7a06b85:/app# smbclient -L //samba/
Password for [WORKGROUP\root]:

	Sharename       Type      Comment
	---------       ----      -------
	share1          Disk
	IPC$            IPC       IPC Service (Docker Samba Server)
SMB1 disabled -- no workgroup available

Here is my network drive connector yml file:

#Configurations for the Network Drive Connector

# ------------------------------- Network Drive configuration settings -------------------------------
#The domain name of the Network Drive server for NTLM authentication
network_drive.domain: "WORKGROUP"
#The username used to login to Network Drive server
network_drive.username: "root"
#The password used to login to Network Drive server
network_drive.password: "bar"
#The relative path of the Network Drive.
network_drive.path: "share1"
# The name of the server hosting the Network Drive
network_drive.server_name: "Samba"
# The IP address of the server hosting the Network Drive
network_drive.server_ip: "samba"
#The name of the machine where the connector will run
client_machine.name: "network-drive-connector"
# ------------------------------- Workplace Search configuration settings -------------------------------
#Access token for Workplace search authentication
enterprise_search.api_key: "256781639e2785ac2b8c7be1005f56f0bc14cc99a2953d1b230a16041cf44a6a"
#Source identifier for the custom source created on the workplace search server
enterprise_search.source_id: "651be140a03b1a898b9598b9"
#Workplace search server address Example: http://es-host:3002 
enterprise_search.host_url: "http://ent-search:3002/"
# ------------------------------- Connector specific configuration settings -------------------------------
#Specifies the objects to be fetched and indexed in the WorkPlace search along with fields that needs to be included/excluded. The list of the objects with a pattern to be included/excluded is provided. By default all the objects are fetched
include:
   size:
   path_template: ["**/*.txt", "**/*.contact", "**/*.docx", "**/*.json", "**/*.png", "**/*.jpg", "**/*.jpeg", "**/*.py", "**/*.yml", "**/*.md", "**/*.ini", "**/*.sh", "**/*.rst", "**/*.pdf", "**/*.rtf", "**/*.ppt", "**/*.file"]
exclude:
  size: [">10000000"]
  path_template:
#The timestamp after which all the objects that are modified or created are fetched from the Network Drive. By default, all the objects present in the Network Drive till the end_time are fetched
start_time : 
#The timestamp before which all the updated objects need to be fetched i.e. the connector won't fetch any object updated/created after the end_time. By default, all the objects updated/added till the current time are fetched
end_time : 
#The level of the logs the user wants to use in the log files. The possible values include: DEBUG, INFO, WARN, ERROR. By default, the level is INFO
log_level: INFO
#The number of retries to perform in case of server error. The connector will use exponential back-off for retry mechanism
retry_count: 3
#Number of threads to be used in multithreading for the Network Drive sync.
network_drives_sync_thread_count: 5
#Number of threads to be used in multithreading for the enterprise search sync.
enterprise_search_sync_thread_count: 5
#Denotes whether document permission will be enabled or not
enable_document_permission: Yes
#The path of csv file containing mapping of Network Drive user ID to Workplace user ID
network_drive_enterprise_search.user_mapping: ""

after looking into #25 and changing network_drive.path to

network_drive.path: "samba/share1"

.. I get the following error:

root@a4b8ddf1f9d3:/app# ees_network_drive -c network_drive_connector.yml full-sync
Indexing started at: 2023-10-05T09:04:32Z
Unknown error while fetching files Failed to list share1 on samba: Unable to connect to shared device
==================== SMB Message 0 ====================
SMB Header:
-----------
Command: 0x03 (SMB2_COM_TREE_CONNECT)
Status: 0x00000000
Flags: 0x00
PID: 76
MID: 3
TID: 0
Data: 34 bytes
b'0900000048001a005c005c00530041004d00420041005c00730061006d0062006100'
SMB Data Packet (hex):
----------------------
b'fe534d42400000000000000003000000000000000000000003000000000000004c00000000000000546e357100000000000000000000000000000000000000000900000048001a005c005c00530041004d00420041005c00730061006d0062006100'
==================== SMB Message 1 ====================
SMB Header:
-----------
Command: 0x03 (SMB2_COM_TREE_CONNECT)
Status: 0xC00000CC
Flags: 0x01
PID: 76
MID: 3
TID: 0
Data: 9 bytes
b'090000000000000000'
SMB Data Packet (hex):
----------------------
b'fe534d4240000000cc0000c003000100010000000000000003000000000000004c00000000000000546e35710000000000000000000000000000000000000000090000000000000000'
Traceback (most recent call last):
  File "/root/.local/lib/python3.8/site-packages/ees_network_drive/files.py", line 93, in recursive_fetch
    file_list = smb_connection.listPath(service_name, rf'{path}', search=16)
  File "/root/.local/lib/python3.8/site-packages/smb/SMBConnection.py", line 210, in listPath
    self._pollForNetBIOSPacket(timeout)
  File "/root/.local/lib/python3.8/site-packages/smb/SMBConnection.py", line 649, in _pollForNetBIOSPacket
    self.feedData(data)
  File "/root/.local/lib/python3.8/site-packages/nmb/base.py", line 54, in feedData
    self._processNMBSessionPacket(self.data_nmb)
  File "/root/.local/lib/python3.8/site-packages/nmb/base.py", line 75, in _processNMBSessionPacket
    self.onNMBSessionMessage(packet.flags, packet.data)
  File "/root/.local/lib/python3.8/site-packages/smb/base.py", line 150, in onNMBSessionMessage
    if self._updateState(self.smb_message):
  File "/root/.local/lib/python3.8/site-packages/smb/base.py", line 344, in _updateState_SMB2
    req.callback(message, **req.kwargs)
  File "/root/.local/lib/python3.8/site-packages/smb/base.py", line 736, in connectCB
    errback(OperationFailure('Failed to list %s on %s: Unable to connect to shared device' % ( path, service_name ), messages_history))
  File "/root/.local/lib/python3.8/site-packages/smb/SMBConnection.py", line 204, in eb
    raise failure
smb.smb_structs.OperationFailure: Failed to list share1 on samba: Unable to connect to shared device
==================== SMB Message 0 ====================
SMB Header:
-----------
Command: 0x03 (SMB2_COM_TREE_CONNECT)
Status: 0x00000000
Flags: 0x00
PID: 76
MID: 3
TID: 0
Data: 34 bytes
b'0900000048001a005c005c00530041004d00420041005c00730061006d0062006100'
SMB Data Packet (hex):
----------------------
b'fe534d42400000000000000003000000000000000000000003000000000000004c00000000000000546e357100000000000000000000000000000000000000000900000048001a005c005c00530041004d00420041005c00730061006d0062006100'
==================== SMB Message 1 ====================
SMB Header:
-----------
Command: 0x03 (SMB2_COM_TREE_CONNECT)
Status: 0xC00000CC
Flags: 0x01
PID: 76
MID: 3
TID: 0
Data: 9 bytes
b'090000000000000000'
SMB Data Packet (hex):
----------------------
b'fe534d4240000000cc0000c003000100010000000000000003000000000000004c00000000000000546e35710000000000000000000000000000000000000000090000000000000000'

Found an end signal in the queue. Closing Thread ID 281473285788128
Thread ID: 281473285788128 Total 0 documents             indexed out of: 0 till now..
Found an end signal in the queue. Closing Thread ID 281473285788128
Thread ID: 281473285788128 Total 0 documents             indexed out of: 0 till now..
Found an end signal in the queue. Closing Thread ID 281473285788128
Thread ID: 281473285788128 Total 0 documents             indexed out of: 0 till now..
Found an end signal in the queue. Closing Thread ID 281473285788128
Thread ID: 281473285788128 Total 0 documents             indexed out of: 0 till now..
Found an end signal in the queue. Closing Thread ID 281473285788128
Thread ID: 281473285788128 Total 0 documents             indexed out of: 0 till now..
Successfully saved the checkpoint
Indexing ended at: 2023-10-05T09:04:32Z
root@a4b8ddf1f9d3:/app#

Unexpected behaviour of connect_and_get_all_folders method in sync_network_drives.py

I get this error when network_drive_connector.yml has network_drive.path: "public".

"sync_network_drives.py", line 63, in connect_and_get_all_folders
    path=os.path.join(*self.drive_path.parts[1:]),
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: join() missing 1 required positional argument: 'path'

It seems that network_drive.path must be in form of "public/dummy" etc, and cannot be "public".
Is it an expected behaviour?

When will this be out of beta and into GA?

⚠️ This connector package is a beta feature. Beta features are subject to change and are not covered by the support SLA of generally available (GA) features. Elastic plans to promote this feature to GA in a future release.
ℹ️ This connector package requires a compatible Elastic subscription level. Refer to the Elastic subscriptions pages for Elastic Cloud and self-managed deployments.

We have a need for this in our organisation for millions of documents stored on a shared drive so would like to know current estimates timelines for this.

If timelines are not suitable are there any other commercial and or open source connectors which are compatible which we could look into?

Doc: What kind of network paths do we support?

Will be helpful to document the type of network drive paths we support, e.g. will it work against any network drives as long as the network paths are using local path, UNC, SMB, Windows mapped drivers, etc..

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.