(Pdb) c
2023-07-20 10:17:54,985| WAR | MainThrea/1032@sshtunnel | Could not read SSH configuration file: ~/.ssh/config
WARNING | 2023-07-20 10:17:54,985 | Could not read SSH configuration file: ~/.ssh/config
2023-07-20 10:17:54,987| INF | MainThrea/1060@sshtunnel | 1 keys loaded from agent
INFO | 2023-07-20 10:17:54,987 | 1 keys loaded from agent
2023-07-20 10:17:54,988| INF | MainThrea/1117@sshtunnel | 1 key(s) loaded
INFO | 2023-07-20 10:17:54,988 | 1 key(s) loaded
2023-07-20 10:17:54,988| ERR | MainThrea/1314@sshtunnel | Password is required for key /export/lab/.ssh/mlw01.key
ERROR | 2023-07-20 10:17:54,988 | Password is required for key /export/lab/.ssh/mlw01.key
2023-07-20 10:17:54,988| INF | MainThrea/0978@sshtunnel | Connecting to gateway: xx.x.xxx.x:22 as user 'lab'
INFO | 2023-07-20 10:17:54,988 | Connecting to gateway: 172.17.10.110:22 as user 'lab'
2023-07-20 10:17:54,988| DEB | MainThrea/0983@sshtunnel | Concurrent connections allowed: True
2023-07-20 10:17:54,989| DEB | MainThrea/1400@sshtunnel | Trying to log in with key: b'asdWEQWEQWe'
2023-07-20 10:17:55,012| DEB | MainThrea/1204@sshtunnel | Transport socket info: (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 0), timeout=0.1
2023-07-20 10:17:55,043| INF | Thread-1/1893@transport | Connected (version 2.0, client OpenSSH_7.6p1)
INFO | 2023-07-20 10:17:55,043 | Connected (version 2.0, client OpenSSH_7.6p1)
2023-07-20 10:17:55,278| INF | Thread-1/1893@transport | Authentication (publickey) successful!
INFO | 2023-07-20 10:17:55,278 | Authentication (publickey) successful!
2023-07-20 10:17:55,279| ERR | MainThrea/1230@sshtunnel | Problem setting SSH Forwarder up: Couldn't open tunnel :50052 <> 127.0.0.1:50052 might be in use or destination not reachable
ERROR | 2023-07-20 10:17:55,279 | Problem setting SSH Forwarder up: Couldn't open tunnel :50052 <> 127.0.0.1:50052 might be in use or destination not reachable
2023-07-20 10:17:55,280| WAR | MainThrea/1032@sshtunnel | Could not read SSH configuration file: ~/.ssh/config
WARNING | 2023-07-20 10:17:55,280 | Could not read SSH configuration file: ~/.ssh/config
2023-07-20 10:17:55,282| INF | MainThrea/1060@sshtunnel | 1 keys loaded from agent
INFO | 2023-07-20 10:17:55,282 | 1 keys loaded from agent
2023-07-20 10:17:55,282| INF | MainThrea/1117@sshtunnel | 1 key(s) loaded
INFO | 2023-07-20 10:17:55,282 | 1 key(s) loaded
2023-07-20 10:17:55,283| ERR | MainThrea/1314@sshtunnel | Password is required for key /export/lab/.ssh/mlw01.key
ERROR | 2023-07-20 10:17:55,283 | Password is required for key /export/lab/.ssh/mlw01.key
2023-07-20 10:17:55,283| INF | MainThrea/0978@sshtunnel | Connecting to gateway: 172.17.10.110:22 as user 'lab'
INFO | 2023-07-20 10:17:55,283 | Connecting to gateway: 172.17.10.110:22 as user 'lab'
2023-07-20 10:17:55,283| DEB | MainThrea/0983@sshtunnel | Concurrent connections allowed: True
2023-07-20 10:17:55,283| WAR | MainThrea/1618@sshtunnel | It looks like you didn't call the .stop() before the SSHTunnelForwarder obj was collected by the garbage collector! Running .stop(force=True)
WARNING | 2023-07-20 10:17:55,283 | It looks like you didn't call the .stop() before the SSHTunnelForwarder obj was collected by the garbage collector! Running .stop(force=True)
2023-07-20 10:17:55,284| INF | MainThrea/1374@sshtunnel | Closing all open connections...
INFO | 2023-07-20 10:17:55,284 | Closing all open connections...
2023-07-20 10:17:55,284| DEB | MainThrea/1378@sshtunnel | Listening tunnels: None
2023-07-20 10:17:55,284| WAR | MainThrea/1450@sshtunnel | Tunnels are not started. Please .start() first!
WARNING | 2023-07-20 10:17:55,284 | Tunnels are not started. Please .start() first!
2023-07-20 10:17:55,284| INF | MainThrea/1453@sshtunnel | Closing ssh transport
INFO | 2023-07-20 10:17:55,284 | Closing ssh transport
2023-07-20 10:17:55,284| DEB | MainThrea/1477@sshtunnel | Transport is closed
2023-07-20 10:17:55,285| DEB | MainThrea/1400@sshtunnel | Trying to log in with key: b'463095aa1803da78647cd548f37173ef'
2023-07-20 10:17:55,305| DEB | MainThrea/1204@sshtunnel | Transport socket info: (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 0), timeout=0.1
2023-07-20 10:17:55,334| INF | Thread-3/1893@transport | Connected (version 2.0, client OpenSSH_7.6p1)
INFO | 2023-07-20 10:17:55,334 | Connected (version 2.0, client OpenSSH_7.6p1)
2023-07-20 10:17:55,578| INF | Thread-3/1893@transport | Authentication (publickey) successful!
INFO | 2023-07-20 10:17:55,578 | Authentication (publickey) successful!
2023-07-20 10:17:55,579| INF | Srv-50053/1433@sshtunnel | Opening tunnel: 0.0.0.0:50053 <> 127.0.0.1:50052
INFO | 2023-07-20 10:17:55,579 | Opening tunnel: 0.0.0.0:50053 <> 127.0.0.1:50052
INFO | 2023-07-20 10:17:55,580 | Checking server mlw-cluster
2023-07-20 10:17:55,814| TRA | Thread-5 /0360@sshtunnel | #1 <-- ('127.0.0.1', 44364) connected
2023-07-20 10:17:55,815| TRA | Thread-5 /0316@sshtunnel | >>> OUT #1 <-- ('127.0.0.1', 44364) send to ('127.0.0.1', 50052): b'504f5354202f636865636b2f20485454502f312e310d0a486f73743a203132372e302e302e313a35303035330d0a557365722d4167656e743a20707974686f6e2d72657175657374732f322e33312e300d0a4163636570742d456e636f64696e673a20677a69702c206465666c6174650d0a4163636570743a202a2f2a0d0a436f6e6e656374696f6e3a206b6565702d616c6976650d0a436f6e74656e742d4c656e6774683a203330300d0a436f6e74656e742d547970653a206170706c69636174696f6e2f6a736f6e0d0a0d0a7b2264617461223a20227b5c6e202020205c226e616d655c223a205c227e2f6d6c772d636c75737465725c222c5c6e202020205c227265736f757263655f747970655c223a205c22636c75737465725c222c5c6e202020205c227265736f757263655f737562747970655c223a205c22436c75737465725c222c5c6e202020205c226970735c223a205b5c6e20202020202020205c223137322e31372e31302e3131305c225c6e202020205d2c5c6e202020205c227373685f63726564735c223a207b5c6e20202020202020205c227373685f757365725c223a205c226c61625c222c5c6e20202020202020205c227373685f707269766174655f6b65795c223a205c222f6578706f72742f6c61622f2e7373682f6d6c7730312e6b65795c225c6e202020207d5c6e7d227d' >>>
2023-07-20 10:17:55,816| TRA | Thread-5 /0333@sshtunnel | <<< IN #1 <-- ('127.0.0.1', 44364) recv: b'5353482d322e302d4f70656e5353485f372e367031205562756e74752d347562756e7475302e350d0a' <<<
INFO | 2023-07-20 10:17:55,816 | Server mlw-cluster is up, but the HTTP server may not be up.
INFO | 2023-07-20 10:17:55,817 | Restarting HTTP server on mlw-cluster.
INFO | 2023-07-20 10:17:55,817 | Running command on mlw-cluster: pkill -f "python -m runhouse.servers.http.http_server"
2023-07-20 10:17:55,817| TRA | Thread-5 /0311@sshtunnel | >>> OUT #1 <-- ('127.0.0.1', 44364) recv empty data >>>
2023-07-20 10:17:55,820| TRA | Thread-5 /0375@sshtunnel | #1 <-- ('127.0.0.1', 44364) connection closed.
INFO | 2023-07-20 10:17:56,571 | Running command on mlw-cluster: screen -dm bash -c 'python -m runhouse.servers.http.http_server |& tee -a ~/.rh/cluster_server_mlw-cluster.log 2>&1'
INFO | 2023-07-20 10:18:02,291 | Checking server mlw-cluster again.
2023-07-20 10:18:02,318| ERR | Thread-3/1893@transport | Secsh channel 1 open FAILED: Connection refused: Connect failed
ERROR | 2023-07-20 10:18:02,318 | Secsh channel 1 open FAILED: Connection refused: Connect failed
2023-07-20 10:18:02,318| TRA | Thread-14/0357@sshtunnel | #2 <-- ('127.0.0.1', 47456) open new channel ssh error: ChannelException(2, 'Connect failed')
2023-07-20 10:18:02,318| ERR | Thread-14/0394@sshtunnel | Could not establish connection from local ('127.0.0.1', 50053) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
ERROR | 2023-07-20 10:18:02,318 | Could not establish connection from local ('127.0.0.1', 50053) to remote ('127.0.0.1', 50052) side of the tunnel: open new channel ssh error: ChannelException(2, 'Connect failed')
Traceback (most recent call last):
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/site-packages/urllib3/connectionpool.py", line 714, in urlopen
httplib_response = self._make_request(
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/site-packages/urllib3/connectionpool.py", line 466, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/site-packages/urllib3/connectionpool.py", line 461, in _make_request
httplib_response = conn.getresponse()
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/http/client.py", line 1375, in getresponse
response.begin()
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/http/client.py", line 318, in begin
version, status, reason = self._read_status()
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/http/client.py", line 287, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/site-packages/requests/adapters.py", line 486, in send
resp = conn.urlopen(
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/site-packages/urllib3/connectionpool.py", line 798, in urlopen
retries = retries.increment(
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/site-packages/urllib3/util/retry.py", line 550, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/site-packages/urllib3/connectionpool.py", line 714, in urlopen
httplib_response = self._make_request(
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/site-packages/urllib3/connectionpool.py", line 466, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/site-packages/urllib3/connectionpool.py", line 461, in _make_request
httplib_response = conn.getresponse()
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/http/client.py", line 1375, in getresponse
response.begin()
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/http/client.py", line 318, in begin
version, status, reason = self._read_status()
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/http/client.py", line 287, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/export/lab/work/learn_runhouse/testmlw01.py", line 4, in <module>
cluster = rh.cluster(
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/site-packages/runhouse/rns/hardware/cluster_factory.py", line 59, in cluster
return Cluster(ips=ips, ssh_creds=ssh_creds, name=name, dryrun=dryrun)
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/site-packages/runhouse/rns/hardware/cluster.py", line 60, in __init__
self.check_server()
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/site-packages/runhouse/rns/hardware/cluster.py", line 381, in check_server
self.client.check_server(cluster_config=cluster_config)
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/site-packages/runhouse/servers/http/http_client.py", line 48, in check_server
self.request(
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/site-packages/runhouse/servers/http/http_client.py", line 35, in request
response = req_fn(
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/site-packages/requests/api.py", line 115, in post
return request("post", url, data=data, json=json, **kwargs)
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/export/lab/anaconda3/envs/runhouse/lib/python3.10/site-packages/requests/adapters.py", line 501, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
wget https://raw.githubusercontent.com/run-house/runhouse/main/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
Python Platform: Linux-5.19.0-46-generic-x86_64-with-glibc2.35
Python Version: 3.10.12 (main, Jul 5 2023, 18:54:27) [GCC 11.2.0]
Relevant packages:
boto3==1.28.6
fastapi==0.99.0
fsspec==2023.6.0
pyarrow==12.0.1
pycryptodome==3.12.0
rich==13.4.2
runhouse==0.0.9
skypilot==0.3.3
sshfs==2023.7.0
sshtunnel==0.4.0
typer==0.9.0
uvicorn==0.23.1
wheel==0.38.4
SkyPilot collects usage data to improve its services. `setup` and `run` commands are not collected to ensure privacy.
Usage logging can be disabled by setting the environment variable SKYPILOT_DISABLE_USAGE_COLLECTION=1.
Checking credentials to enable clouds for SkyPilot.
AWS: disabled
Reason: AWS credentials are not set. Run the following commands:
$ pip install boto3
$ aws configure
$ aws configure list # Ensure that this shows identity is set.
For more info: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html
Details: `aws sts get-caller-identity` failed with error: [botocore.exceptions.NoCredentialsError] Unable to locate credentials.
Azure: disabled
Reason: ~/.azure/msal_token_cache.json does not exist. Run the following commands:
$ az login
$ az account set -s <subscription_id>
For more info: https://docs.microsoft.com/en-us/cli/azure/get-started-with-azure-cli
GCP: disabled
Reason: GCP tools are not installed. Run the following commands:
$ pip install google-api-python-client
$ conda install -c conda-forge google-cloud-sdk -y
Credentials may also need to be set. Run the following commands:
$ gcloud init
$ gcloud auth application-default login
For more info: https://skypilot.readthedocs.io/en/latest/getting-started/installation.html#google-cloud-platform-gcp
Details: [builtins.ModuleNotFoundError] No module named 'googleapiclient'
Lambda: disabled
Reason: Failed to access Lambda Cloud with credentials. To configure credentials, go to:
https://cloud.lambdalabs.com/api-keys
to generate API key and add the line
api_key = [YOUR API KEY]
to ~/.lambda_cloud/lambda_keys
IBM: disabled
Reason: Missing credential file at /export/lab/.ibm/credentials.yaml.
Store your API key and Resource Group id in ~/.ibm/credentials.yaml in the following format:
iam_api_key: <IAM_API_KEY>
resource_group_id: <RESOURCE_GROUP_ID>
SCP: disabled
Reason: Failed to access SCP with credentials. To configure credentials, see: https://cloud.samsungsds.com/openapiguide
Generate API key and add the following line to ~/.scp/scp_credential:
access_key = [YOUR API ACCESS KEY]
secret_key = [YOUR API SECRET KEY]
project_id = [YOUR PROJECT ID]
OCI: disabled
Reason: `oci` is not installed. Install it with: pip install oci
For more details, refer to: https://skypilot.readthedocs.io/en/latest/getting-started/installation.html#oracle-cloud-infrastructure-oci
Cloudflare (for R2 object store): disabled
Reason: [r2] profile is not set in ~/.cloudflare/r2.credentials. Additionally, Account ID from R2 dashboard is not set. Run the following commands:
$ pip install boto3
$ AWS_SHARED_CREDENTIALS_FILE=~/.cloudflare/r2.credentials aws configure --profile r2
$ mkdir -p ~/.cloudflare
$ echo <YOUR_ACCOUNT_ID_HERE> > ~/.cloudflare/accountid
For more info: https://skypilot.readthedocs.io/en/latest/getting-started/installation.html#cloudflare-r2
SkyPilot will use only the enabled clouds to run tasks. To change this, configure cloud credentials, and run sky check.
If any problems remain, please file an issue at https://github.com/skypilot-org/skypilot/issues/new
Clusters
No existing clusters.
Managed spot jobs
No in progress jobs. (See: sky spot -h)