microsoft / hcsshim Goto Github PK
View Code? Open in Web Editor NEWWindows - Host Compute Service Shim
License: MIT License
Windows - Host Compute Service Shim
License: MIT License
This issue is probably not so much about docker, so please bear with for a moment instead of insta-closing.
I uninstalled docker for windows but cant seem to delete files it leaves in the C:\ProgramData\Docker\windowsfilter folder. The advice is to use a "dangerous" utility that makes use of this hcsshim to effectively handle the deletion.
This readme states that this hcsshim is used by docker (mostly*), so it seems reasonable that its what docker used to create the files I cant delete. Even after taking ownership and trying to replace child permissions.
icacls "C:\ProgramData\Docker\" /T /C /grant Administrators:F
...
Successfully processed 212988 files; Failed processing 111741 files
Is this module creating files in such a way that they cannot be normally removed by a machine administrator?
If so, these files are holding 20 GB of my disk hostage that I would like to free up using conventional (and safe) windows deletion means.
When an HNS NAT network exists on a Windows Server 1709 machine, the link-local 169.254.169.253 address cannot be used to resolve DNS names. In AWS there is a configuration option to add this address to the list of DNS resolvers for the public ethernet interface. When the HNS NAT network is removed (via Remove-HNSNetwork
powershell command), that link-local IP address can be used to successfully resolve DNS names.
nslookup google.com 169.254.169.253
. This fails to resolve the google.com
DNS name. See below for what this failure looks like.PS C:\Users\Administrator> nslookup google.com 169.254.169.253
Server: UnKnown
Address: 169.254.169.253
*** UnKnown can't find google.com: No response from server
If you remove the HNS NAT network with the following go program (or Remove-HNSNetwork
powershell command), the nslookup google.com 169.254.169.253
command will succeed.
package main
import (
"fmt"
"os"
"github.com/Microsoft/hcsshim"
)
func main() {
nets, err := hcsshim.HNSListNetworkRequest("GET", "", "")
if err != nil {
fmt.Println(err.Error())
os.Exit(1)
}
for _, n := range nets {
fmt.Printf("deleting: %s\n", n.Name)
n.Delete()
}
}
To reproduce:
hcsshim.QosPolicy
with MaximumOutgoingBandwidthInBytes: 1024
The upload will take < 1 second, as opposed to the expected 100 seconds
Windows Kernel version: 10.0 16299 (16299.15.amd64fre.rs3_release.170928-1534)
Starting a docker container with simple command: docker run --rm microsoft/nanoserver
starts two vmwp.exe
instances. One of them is killed when the container is stopped. The other one can only be stopped by killing it from the task manager.
One more thing, the unnecessary vmwp.exe
instance creates two vmmem
instances, while the correct one creates only one instance.
This causes some weird issues as some file handles are still used by the zombie instance and docker cannot get a handle to these files.
docker info: (I have debugged and found that the hcsshim.CreateContainer call is the one that creates two instances)
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 7
Server Version: 18.03.0-ce
Storage Driver: windowsfilter (windows) lcow (linux)
Windows:
LCOW:
Logging Driver: json-file
Plugins:
Volume: local
Network: ics l2bridge l2tunnel nat null overlay transparent
Log: awslogs etwlogs fluentd gelf json-file logentries splunk syslog
Swarm: inactive
Default Isolation: hyperv
Kernel Version: 10.0 17134 (17134.1.amd64fre.rs4_release.180410-1804)
Operating System: Windows 10 Enterprise
OSType: windows
Architecture: x86_64
CPUs: 8
Total Memory: 15.89GiB
Name: YUSUFG-PC
ID: 7BCW:GTLJ:2FQB:KV3J:NRRE:IHTW:6C3B:YOCE:65GK:NKHJ:IS3D:3LNJ
Docker Root Dir: C:\ProgramData\Docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: -1
Goroutines: 21
System Time: 2018-07-02T17:17:15.1751126+03:00
EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
Experimental: true
Insecure Registries:
172.17.5.200:5000
192.168.10.61:5000
127.0.0.0/8
Live Restore Enabled: false
When trying to run my Rabbit image from my compose file I get this error below.
ERROR: for bin_Rabbit_1 Cannot start service Rabbit: container 4c49c5ce1c9be3f3deca474403b7a9df44ac09151bae5126c60768cf01767428 encountered an error during CreateContainer: hcsshim: timeout waiting for notification extra info: {"SystemType":"Container","Name":"4c49c5ce1c9be3f3deca474403b7a9df44ac09151bae5126c60768cf01767428","Owner":"docker","IgnoreFlushesDuringBoot":true,"LayerFolderPath":"C:\ProgramData\Docker\windowsfilter\4c49c5ce1c9be3f3deca474403b7a9df44ac09151bae5126c60768cf01767428","Layers":[{"ID":"dee86350-459f-580c-ae1e-fcc1bee0baa2","Path":"C:\ProgramData\Docker\windowsfilter\c056770fbd90992091b4d16fe9c2b608d739689b9b8d6f8b24edc6ecd36cfb3c"},{"ID":"cf555c82-e82c-5103-9328-02e79a453583","Path":"C:\ProgramData\Docker\windowsfilter\a1a2460d2ca841aac7e92a84fba5af4fa61274d960eff41ac1d5384bfff30efd"},{"ID":"d21d7c62-9717-5f8c-b40b-2e20f2c99b04","Path":"C:\ProgramData\Docker\windowsfilter\80fbafed497f150bbbdec469621cdb01494ac302443ee66be797ec167a88607c"},{"ID":"34db3c60-39de-51dc-bfbf-e9c907c6e86b","Path":"C:\ProgramData\Docker\windowsfilter\a10b48318221d634f2df7d4f6bbd8c4c24170cf92c8b54eaf15d21c6d12efe45"},{"ID":"5fb20f07-e3dc-5034-86a3-f1103c3377c3","Path":"C:\ProgramData\Docker\windowsfilter\4cfafd1cab11aa92b130ef6c3ed9a0f41d89ae500255424901c50408edbeb45b"},{"ID":"dbe867e1-5401-5747-9edc-02780de37593","Path":"C:\ProgramData\Docker\windowsfilter\802c32841d34c97eb63462666065b9fec57a4ea2aa9373bffe1425790078f4d1"},{"ID":"12b89f02-8d16-598e-86b0-f7c17f82612e","Path":"C:\ProgramData\Docker\windowsfilter\53801ceea5ae8088b78b6af799956087740807560aece1f504f1bb3c40efdee6"},{"ID":"28fb85e9-113c-5072-9ca5-d7de54103a5c","Path":"C:\ProgramData\Docker\windowsfilter\f6b75b2ad9713292ba588c5cd81a1efa8aadbebd6f23811cdd13327f0504d1fe"},{"ID":"2dea95ad-c46c-5ab3-a9eb-cac8df2c1451","Path":"C:\ProgramData\Docker\windowsfilter\33dcdab745abced6c32b832722b708284da1cc5ab049fe418af8c5ae42659670"},{"ID":"0df81b3f-652b-596c-8223-acaab6087dad","Path":"C:\ProgramData\Docker\windowsfilter\2c2d427c6268e0520729be4107b6c3839fd63ffdd05236a5cc9cbdc6b3ce7190"},{"ID":"de2d5623-089a-5d2c-944e-9246b670b4e6","Path":"C:\ProgramData\Docker\windowsfilter\16f6cc2dd45bfbd1be8b3255612f7740731744ee6f6dbb3f049eefd535df962f"},{"ID":"fd01999c-0b74-515c-ad9b-71b4236015eb","Path":"C:\ProgramData\Docker\windowsfilter\8b8f0948e6aa5ad08b3042a03c0bbd5ffd971d9f7e26d052d5afde4abb1837ad"},{"ID":"fc9ade98-724f-5fbb-8363-1ba433028c3d","Path":"C:\ProgramData\Docker\windowsfilter\abe09c74ed9ef55cde9a138b9b1cba2a3d987a43d2a3d492018a9e8b2d2bd94e"},{"ID":"60162656-a118-5f4a-a081-7114cce85437","Path":"C:\ProgramData\Docker\windowsfilter\1d6314999ada0560529b2fbbb14d4f35341cd2911959c0fe9be85d736ff3ca29"},{"ID":"fa0bbe42-6d85-531a-bfcb-1822906ff2c3","Path":"C:\ProgramData\Docker\windowsfilter\beb26da51fdda5d9d72ba60069d9b65fe35052013fac5f765775fc6e9224bf6b"},{"ID":"1d9b3c2c-68e4-5e56-82a9-3073faa6b72a","Path":"C:\ProgramData\Docker\windowsfilter\f420d7b6053c27051b688473386b8b621cf2a6f3ecca9f2600dfda0f2de20a92"}],"MemoryMaximumInMB":3072,"HostName":"4c49c5ce1c9b","HvPartition":true,"EndpointList":["221d7a6a-4b70-4f8f-bccd-afa1b6deb906"],"HvRuntime":{"ImagePath":"C:\ProgramData\Docker\windowsfilter\beb26da51fdda5d9d72ba60069d9b65fe35052013fac5f765775fc6e9224bf6b\UtilityVM"},"AllowUnqualifiedDNSQuery":true}
If I run the rabbit image using the docker run command the image will work fine. Only through compose it gives this error! I understand that its timing out waiting for a notification. I just dont know how exactly to go about fixing this or what to look at.
legacyLayerReader.Next has code that specifically ignores wcidirs files that aren't under "Files" hierarchy. Now that a wcidirs file for the Files folder itself is produced, we can eventually use that to restore permissions on the Files folder that correspond to permissions that were set by the user on the container root volume. For this to work, the reader needs to preserve the file content from Files.$wcidirs$ instead of ignoring it.
This change should be verified on the downlevel hosts (RS1) to ensure that it doesn't interfere with their functionality.
When:
Then extracting the first new layer (layer D in this example) will fail.
To reproduce, run in one terminal:
PS C:\Users\vagrant> docker pull microsoft/windowsservercore:1709_KB4054517
1709_KB4054517: Pulling from microsoft/windowsservercore
5847a47b8593: Already exists
e50cc21fbc56: Already exists
Digest: sha256:65a11ae1d7096b850c02184cfe30b7ef5665357472a34098afdcda94546b91b8
Status: Downloaded newer image for microsoft/windowsservercore:1709_KB4054517
PS C:\Users\vagrant> docker run -it microsoft/windowsservercore:1709_KB4054517 cmd.exe
and then in a second terminal, pull a new version of the container image:
PS C:\Users\vagrant> docker pull microsoft/windowsservercore:1709_KB4056892
1709_KB4056892: Pulling from microsoft/windowsservercore
5847a47b8593: Already exists
9f887ccb8077: Extracting [==================================================>] 689.7MB/689.7MB
failed to register layer: re-exec error: exit status 1: output: remove \\?\C:\ProgramData\docker\windowsfilter\572e75dfac18f1e5a7cf134a584e3376fa75115d17e8508240d2f71a0cb9fa14\UtilityVM\Files\Windows\WinSxS\amd64_microsoft-windows-workstationservice_31bf3856ad364e35_10.0.16299.15_none_ef2643e047f6349e\wkssvc.dll: Access is denied.
From docker info
:
Default Isolation: process
Kernel Version: 10.0 16299 (16299.15.amd64fre.rs3_release.170928-1534)
Runc spec generates a boilerplate OCI config.json that can be modified for a particular use case. Defaults to current directory when a bundle isn't specified.
For the time being, could you publish a new release with runhcs
as a zip/tar file that can be downloaded on the release page as an easy way for users to get the binary to use?
The syscallWatcher
today will effectively stack goroutines overtime. At the time of this post the defaultTimeout
is 4 minutes which means that all syscalls (even completed ones) will have an open goroutine (although sleeping) for 4 minutes. This pattern should support a context.Context
cancellation after a return from a syscall
as we no longer need to monitor for a hung state.
It would likely look something like:
ctx, cancel := context.WithTimeout(context.Background(), defautTimeout)
defer cancel()
go syscallWatcher(ctx, ...)
// make syscall
return result
So if the syscall returns we cancel the syscallWatcher
and if it times out before returning we get the appropriate syscall hung state as expected.
I am new to go, but I think I am doing this right and it should work. The following uses docker to reproduce the problem to ensure that the issues aren't related to my environment.
$ docker run -it golang:1.7.4
root@1880646cda2e:/go# export GOOS=windows
root@1880646cda2e:/go# export GOARCH=386
root@1880646cda2e:/go# go get github.com/microsoft/hcsshim
# github.com/microsoft/hcsshim
src/github.com/microsoft/hcsshim/hcsshim.go:151: type [1073741824]uint16 too large
Just wondering if this works on standard Windows 10 Or only Windows servers.
I need a way for docker containers to just work on Windows, without virtualbox.
Was added for LCOW. Think we should have a generic "Properties" method and let the caller work out what schema is passed/returned.
Hcsshim is incorrectly holding a lock while calling HcsUnregisterComputeSystemNotification. A race condition exists such that a notification callback completion is waited on while holding the lock necessary to complete it.
Hello,
I'm not sure if this is the right place to ask, but:
we're trying to do some docker-related networking using Hyper-V Switch. We need a way to associate specific container with a Port when it's connected to the switch. We noticed that Port/NIC "Friendly name" seen in Hyper-V Switch is the same as id in "EndpointList" returned by HNS. Is that the way to go?
Also, is there a way to specify existing vSwitch in optional parameter of a network driver? This is what happens when we try it ("vEthernet (DS)" is our switch):
`docker network create -d transparent -o com.docker.network.windowsshim.interface="vEthernet (DS)" mynetwork`
`Error response from daemon: HNS failed with error : The parameter is incorrect.`
time="2016-11-29T06:13:18.184073000-08:00" level=debug msg="Calling GET /_ping"
time="2016-11-29T06:13:18.190170400-08:00" level=debug msg="Calling POST /v1.26/networks/create"
time="2016-11-29T06:13:18.192964600-08:00" level=debug msg="form data: {\"Attachable\":false,\"CheckDuplicate\":true,\"D
river\":\"transparent\",\"EnableIPv6\":false,\"IPAM\":{\"Config\":[],\"Driver\":\"default\",\"Options\":{}},\"Internal\"
:false,\"Labels\":{},\"Name\":\"mynetwork\",\"Options\":{\"com.docker.network.windowsshim.interface\":\"vEthernet (DS)\"
}}"
time="2016-11-29T06:13:18.194740200-08:00" level=debug msg="Allocating IPv4 pools for network mynetwork (035650ac76f2b63
2265f368911e4eb3349bb98b2fb132e83af2ba6ab71946361)"
time="2016-11-29T06:13:18.196157400-08:00" level=debug msg="RequestPool(LocalDefault, , , map[], false)"
time="2016-11-29T06:13:18.197505500-08:00" level=debug msg="RequestAddress(0.0.0.0/0, <nil>, map[RequestAddressType:com.
docker.network.gateway])"
time="2016-11-29T06:13:18.197505500-08:00" level=debug msg="HNSNetwork Request ={\"Name\":\"035650ac76f2b632265f368911e4
eb3349bb98b2fb132e83af2ba6ab71946361\",\"Type\":\"transparent\",\"NetworkAdapterName\":\"vEthernet (DS)\",\"Subnets\":[{
\"AddressPrefix\":\"0.0.0.0/0\",\"GatewayAddress\":\"0.0.0.0\"}]} Address Space=[{0.0.0.0/0 0.0.0.0 []}]"
time="2016-11-29T06:13:19.550560300-08:00" level=debug msg="releasing IPv4 pools from network mynetwork (035650ac76f2b63
2265f368911e4eb3349bb98b2fb132e83af2ba6ab71946361)"
time="2016-11-29T06:13:19.550560300-08:00" level=debug msg="ReleaseAddress(0.0.0.0/0, 0.0.0.0)"
time="2016-11-29T06:13:19.552067700-08:00" level=debug msg="ReleasePool(0.0.0.0/0)"
time="2016-11-29T06:13:19.554412900-08:00" level=error msg="Handler for POST /v1.26/networks/create returned error: HNS
failed with error : The parameter is incorrect. "
In layer.go, the CreateSandboxLayer
and CreateScratchLayer
functions both have a parameter named parentId
. It is a natural way to think that this is the parent layer of the layer we want to create, but from line 34 of create.go, we can deduce that the parentId
is actually the base layer ID, because the order of paths to read-only parent layers given to wclayer create
as argument should ends with base layer. Interestingly, the parentId
parameter is ignored inside the CreateSandboxLayer
and CreateScratchLayer
functions, so it does not make any difference in the execution.
Here, I suggest probably we should delete that parameter. In addition, for argument specifying paths to parent layers like in create
, import
, export
, and mount
, we probably need a better documentation on the order and how the list should be given. The order should be ending with base layer instead of starting with it, and the list should have -l
before every element like -l layer_3 -l layer_2 -l layer_1
.
Tested on Kernel Version: 10.0 17093 (17093.1000.amd64fre.rs_prerelease.180202-1400)
If you attempt to modify a created HNS Endpoint to add ACL Policies, the update fails with HNS failed with error : The parameter is incorrect
. Code to reproduce can be found here. To run:
$env:ROOTFS_PATH=(docker inspect microsoft/windowsservercore-insider:10.0.17093.1000 | ConvertFrom-Json).GraphDriver.Data.Dir
$env:NETWORK_NAME="nat"
.\acl-repro.exe
Container ID: 1521223469702047900
2018/03/16 11:04:32 HNS failed with error : The parameter is incorrect.
This is a regression from 1709:
$env:ROOTFS_PATH=(docker inspect microsoft/windowsservercore:1709 | ConvertFrom-Json).GraphDriver.Data.Dir
$env:NETWORK_NAME="nat"
.\acl-repro.exe
Container ID: 1521223272627319900
added acl to the endpoint
Hello,
I already create an issue in the moby repository moby/moby#37395 but did not receive any feedback yet.
We face with this error on a few of our Windows Server 2016 servers. When we see this error then usually
docker ps
hangs (if Docker is not upgraed to 18.03-ee) or docker run
stuck.
What can be a reason of this error? What recommendations can be here to debug it and prevent in future?
Thanks!
Deleting a HNS network and then creating another one immediately after doesn't work. Here's the error message:
Expected error:
<*errors.errorString | 0xc0422f18b0>: {
s: "HNS failed with error : Element not found. ",
}
HNS failed with error : Element not found.
not to have occurred
Here's a test case:
subnets1 := []hcsshim.Subnet{
{
AddressPrefix: "172.100.0.0/20",
GatewayAddress: "172.100.0.1",
},
}
configuration1 := &hcsshim.HNSNetwork{
Name: "TestNetworkName1",
Type: "transparent",
Subnets: subnets1,
}
subnets2 := []hcsshim.Subnet{
{
AddressPrefix: "172.200.0.0/20",
GatewayAddress: "172.200.0.1",
},
}
configuration2 := &hcsshim.HNSNetwork{
Name: "TestNetworkName2",
Type: "transparent",
Subnets: subnets2,
}
It("doesn't work if there's no delay after DELETE", func() {
configBytes1, err := json.Marshal(configuration1)
Expect(err).ToNot(HaveOccurred())
response, err := hcsshim.HNSNetworkRequest("POST", "", string(configBytes1))
Expect(err).ToNot(HaveOccurred())
hnsID := response.Id
_, err = hcsshim.HNSNetworkRequest("DELETE", hnsID, "")
Expect(err).ToNot(HaveOccurred())
//time.Sleep(time.Second * 20) // 20 second timeout "fixes" the issue
configBytes2, err := json.Marshal(configuration2)
Expect(err).ToNot(HaveOccurred())
response, err = hcsshim.HNSNetworkRequest("POST", "", string(configBytes2))
Expect(err).To(HaveOccurred()) // !!! ERROR
_, err = hcsshim.HNSNetworkRequest("GET", hnsID, "")
Expect(err).To(HaveOccurred()) // but can't GET the deleted network either
})
Note that sleeping for 20 seconds after deleting a network seems to "fix" the issue. 10 second timeout is not enough.
update: I repeated this in Powershell, so this may be problem with HNS. I posted an issue here: MicrosoftDocs/Virtualization-Documentation#516
Tested on Kernel Version: 10.0 17093 (17093.1000.amd64fre.rs_prerelease.180202-1400)
If the source of a bind mount is a symlink, container creation will succeed, but the directory will not be accessible inside the container. This is a regression from 1709. To reproduce:
$dockerImage = "microsoft/windowsservercore-insider:10.0.17093.1000"
$mountDir = "$env:TEMP\mountdir"
mkdir $mountDir
echo hello > "$mountDir\hello.txt"
$symlink = "$env:TEMP\symlink"
cmd.exe /c "mklink /D $symlink $mountDir"
docker run -v"$symlink":c:\containerDir $dockerImage cmd.exe /C "type c:\containerDir\hello.txt"
cmd.exe /c "rmdir $symlink"
rm -r -force $mountDir
The output of this script:
Directory: C:\Windows\TEMP
Mode LastWriteTime Length Name
---- ------------- ------ ----
d----- 3/9/2018 9:09 AM mountdir
symbolic link created for C:\Windows\TEMP\symlink <<===>> C:\Windows\TEMP\mountdir
The create operation failed because the name contained at least one mount point which resolves to a volume to which the specified device object is not attached.
On 1709, (with the container image changed appropriately) running the script show that the container mounts the symlinked directory correctly:
Directory: C:\Windows\TEMP
Mode LastWriteTime Length Name
---- ------------- ------ ----
d----- 3/9/2018 9:07 AM mountdir
symbolic link created for C:\Windows\TEMP\symlink <<===>> C:\Windows\TEMP\mountdir
hello
I somehow got into the following situation
> Get-ContainerNetwork
Name Id Subnets Mode SourceMac DNSServers DNSSuffix
---- -- ------- ---- --------- ---------- ---------
testNet 70c6f3b7-f5fe-462d-94f8-ecc73f83c2c3 {} Transparent
> Get-ContainerNetwork | Remove-ContainerNetwork
Confirm
Remove-ContainerNetwork will remove the container network "".
[Y] Yes [A] Yes to All [N] No [L] No to All [S] Suspend [?] Help (default is "Y"):
Remove-ContainerNetwork : The parameter is incorrect.
At line:1 char:24
+ Get-ContainerNetwork | Remove-ContainerNetwork
+ ~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [Remove-ContainerNetwork], VirtualizationException
+ FullyQualifiedErrorId : OperationFailed,Microsoft.Containers.PowerShell.Cmdlets.RemoveContainerNetwork
> Get-ContainerNetwork
Name Id Subnets Mode SourceMac DNSServers DNSSuffix
---- -- ------- ---- --------- ---------- ---------
testNet 70c6f3b7-f5fe-462d-94f8-ecc73f83c2c3 {} Transparent
> Get-VMSwitch
> Get-NetAdapter
Name InterfaceDescription ifIndex Status MacAddress LinkSpeed
---- -------------------- ------- ------ ---------- ---------
Ethernet0 vmxnet3 Ethernet Adapter 3 Up 00-0C-29-1A-52-31 10 Gbps
Notice how there is no HNS vSwitch
I managed to reproduce the above scenario by:
> New-ContainerNetwork -Name reproNet -Mode transparent -NetworkAdapterName Ethernet0
Name Id Subnets Mode SourceMac DNSServers DNSSuffix
---- -- ------- ---- --------- ---------- ---------
reproNet 731b3b26-3d42-44af-b5ca-fe4253127900 {} Transparent
> Get-VMSwitch
Name SwitchType NetAdapterInterfaceDescription
---- ---------- ------------------------------
New HNS Switch External vmxnet3 Ethernet Adapter
> Get-VMSwitch | Remove-VMSwitch
Confirm
Are you sure you want to remove the virtual switch "New HNS Switch"?
[Y] Yes [A] Yes to All [N] No [L] No to All [S] Suspend [?] Help (default is "Y"):
Remove-VMSwitch : Failed while removing virtual Ethernet switch.
Switch delete failed, switch = 'bc7bce24-6256-4691-9c71-5aab5534531e': General access denied error (0x80070005).
At line:1 char:16
+ Get-VMSwitch | Remove-VMSwitch
+ ~~~~~~~~~~~~~~~
+ CategoryInfo : PermissionDenied: (:) [Remove-VMSwitch], VirtualizationException
+ FullyQualifiedErrorId : AccessDenied,Microsoft.HyperV.PowerShell.Commands.RemoveVMSwitch
> Get-VMSwitch
Name SwitchType NetAdapterInterfaceDescription
---- ---------- ------------------------------
New HNS Switch Private
Notice how vSwitch changed type to Private
> net stop hns
The Host Network Service service is stopping.
The Host Network Service service was stopped successfully.
> Get-VMSwitch
Name SwitchType NetAdapterInterfaceDescription
---- ---------- ------------------------------
New HNS Switch Private
> Get-VMSwitch | Remove-VMSwitch
Confirm
Are you sure you want to remove the virtual switch "New HNS Switch"?
[Y] Yes [A] Yes to All [N] No [L] No to All [S] Suspend [?] Help (default is "Y"):
Remove-VMSwitch : Failed while removing virtual Ethernet switch.
Switch delete failed, switch = 'bc7bce24-6256-4691-9c71-5aab5534531e': General access denied error (0x80070005).
At line:1 char:16
+ Get-VMSwitch | Remove-VMSwitch
+ ~~~~~~~~~~~~~~~
+ CategoryInfo : PermissionDenied: (:) [Remove-VMSwitch], VirtualizationException
+ FullyQualifiedErrorId : AccessDenied,Microsoft.HyperV.PowerShell.Commands.RemoveVMSwitch
> Get-ContainerNetwork
Name Id Subnets Mode SourceMac DNSServers DNSSuffix
---- -- ------- ---- --------- ---------- ---------
reproNet 731b3b26-3d42-44af-b5ca-fe4253127900 {} Transparent
> Get-ContainerNetwork | Remove-ContainerNetwork
Confirm
Remove-ContainerNetwork will remove the container network "".
[Y] Yes [A] Yes to All [N] No [L] No to All [S] Suspend [?] Help (default is "Y"):
No error encountered, but you can still see the container network:
> Get-ContainerNetwork
Name Id Subnets Mode SourceMac DNSServers DNSSuffix
---- -- ------- ---- --------- ---------- ---------
reproNet 731b3b26-3d42-44af-b5ca-fe4253127900 {} Transparent
And now it can't be removed:
> Get-ContainerNetwork | Remove-ContainerNetwork
Confirm
Remove-ContainerNetwork will remove the container network "".
[Y] Yes [A] Yes to All [N] No [L] No to All [S] Suspend [?] Help (default is "Y"):
Remove-ContainerNetwork : The parameter is incorrect.
At line:1 char:24
+ Get-ContainerNetwork | Remove-ContainerNetwork
+ ~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [Remove-ContainerNetwork], VirtualizationException
+ FullyQualifiedErrorId : OperationFailed,Microsoft.Containers.PowerShell.Cmdlets.RemoveContainerNetwork
Also, AFAIK calling *-ContainerNetwork automatically starts hns service:
> net start hns
The requested service has already been started.
More help is available by typing NET HELPMSG 2182.
> NET HELPMSG 2182
The requested service has already been started. // haha thx
Now, after computer restart, I CAN remove "repro":
> Get-ContainerNetwork | Remove-ContainerNetwork
Confirm
Remove-ContainerNetwork will remove the container network "".
[Y] Yes [A] Yes to All [N] No [L] No to All [S] Suspend [?] Help (default is "Y"):
> Get-ContainerNetwork
>
But, restarting DOES NOT HELP with removing the "original" testNet network, so reproduction isn't perfect - it works only until restart.
Question is, how can I remove testNet network? I suspect it has something to do with vswitch not existing, as in my reproduction, but also something else. Is there a way to "factory-reset" hns?
I was trying to create a writable scratch layer based on two read-only layers from microsoft/nanoserver:latest
. After I imported the two layers, when I tried to run .\wclayer.exe create -l l_1 -l l_2 l_3
(l_1
and l_2
are the read-only layers, l_3
is the scratch layer I want to create), it failed with:
ERRO[0000] hcsshim::CreateScratchLayer failed in Win32: The system cannot find the path specified. (0x3) path=C:\Users\t-liazha\Desktop\test\l_3
hcsshim::CreateScratchLayer failed in Win32: The system cannot find the path specified. (0x3) path=C:\Users\t-liazha\Desktop\test\l_3
Any one has any idea on this?
After running the latest Windows Insider build for 2019 (10.0.17733), we noticed that the DNSServerList
for HNSEndpoint
is ignored.
Steps to reproduce this error:
package main
import (
"fmt"
"os"
"github.com/Microsoft/hcsshim"
)
func main() {
containerName := os.Args[1]
networkName := os.Args[2]
endpoint := &hcsshim.HNSEndpoint{
VirtualNetworkName: networkName,
Create a ComputeProcess containerName: containerName,
}
endpoint.DNSServerList = "222.111.111.222,123.123.123.123"
newEndpoint, err := endpoint.Create()
if err != nil {
fmt.Printf("Endpoint creation failed- %s\n", err.Error())
}
err = hcsshim.HotAttachEndpoint(containerName, newEndpoint.Id)
if err != nil {
fmt.Printf("Attaching endpoint failed\n")
}
}
(Get-DNSClientServerAddress).ServerAddresses
to contain 222.111.111.222
and 123.123.123.123
in the container, but we observe that it is blank. This behavior is working as expected in 1709 and 1803 builds.docker info:
Containers: 2
Running: 0
Paused: 0
Stopped: 2
Images: 12
Server Version: master-dockerproject-2018-08-15
Storage Driver: windowsfilter
Windows:
Logging Driver: json-file
Plugins:
Volume: local
Network: ics l2bridge l2tunnel nat null overlay transparent
Log: awslogs etwlogs fluentd gelf json-file logentries splunk syslog
Swarm: inactive
Default Isolation: process
Kernel Version: 10.0 17733 (17733.1000.amd64fre.rs5_release.180803-1525)
Operating System: Windows Server Datacenter Version 1803 (OS Build 17733.1000)
OSType: windows
Architecture: x86_64
CPUs: 4
Total Memory: 32GiB
Name: WIN-8SUSKTQISJR
ID: BZ5Q:F572:PJZR:BLWE:TRIN:JWCP:FPSK:Q7ZH:UZCY:53NM:VCTY:HT5X
Docker Root Dir: C:\ProgramData\docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
As mentioned in docker/for-win#698 Docker for Windows (including Windows Server / Docker EE) can experience slow network performance with WinNAT due to receive segment coalescing (RSC).
I was able to reproduce the issue by having a long-running container that regularly does a git clone of the curl repository. After disabling RSC on all available network adapters, I no longer experienced slow network performance. Even just a regular curl download of large files (>1 GB) caused this issue.
Therefore I would like to suggest to disable RSC via hcsshim on the host adapter everytime a virtual network switch is created for use with Docker.
We noticed that the value of LayerFolderPath
seems to not matter at all. When we set it to any non-empty string, all our integration tests pass (note that we are only creating shared-kernel containers, not Hyper-V).
Does this value actually need to be set to a specific value? Is there some side effect we are not seeing that setting this property affects?
Also please let us know if there is a better medium to ask this question in as it is a very general question
Windows kernel version: 10.0 16299 (16299.15.amd64fre.rs3_release.170928-1534)
There was recently a commit, c898547, that added another return value to the create process func. It is used for type checking the type of error.
This pkg is pretty small right now so I think we can make this even better. There are many different ways for handling typed errors in Go. We can do something like the syscall package does with it's syscall.Errno which is a uintptr for these. Or we can define an error type in the package with codes so that we don't have to have multiple return results to find out a type safe way to get the error.
I suggest doing something like this but I wanted to get your input on what you think before I open a PR.
Option 1:
var (
WaitErrExecFailed = errors.New("hcsshim: wait exec failed")
// Known Win32 RC values which should be trapped
Win32PipeHasBeenEnded = errors.New("hcsshim: The pipe has been ended")
Win32SystemShutdownIsInProgress = errors.New("hcsshim: A system shutdown is in progress")
Win32SpecifiedPathInvalid = errors.New("hcsshim: The specified path is invalid")
Win32SystemCannotFindThePathSpecified = errors.New("hcsshim: The system cannot find the path specified")
Win32InvalidArgument = errors.New("hcsshim: An invalid argument was supplied")
)
By doing this we have typed errors that are easily comparable by the consumer.
_, _, _, _, err := CreateProcessInComputeSystem(id, true, ...)
if err != nil {
if err == hcsshim.Win32InvalidArgument {
// do specific stuff
}
return err
}
Option two if you want to keep the codes from the types above then you can do something like the syscall pkg.
type Errno uint32
func (e Errno) Error() string {
switch e {
case Win32InvalidArgument:
return "hcsshim: An invalid argument was supplied"
}
}
What do you all think?
Using Windows Server 1709 or 1803 we are attempting to use IPSec encryption along with Windows Containers using NAT. For example:
Working:
Client --(unencrypted TCP)--> Container Host --> NAT --> Container
Not working:
Client --(encrypted with IPSec)--> Container Host --> NAT --> Container
IPSec is being enabled via standard WFP configuration with:
New-NetIPsecRule -LocalAddress [local] -RemoteAddress [remote]-InboundSecurity Require -OutboundSecurity Require
We can reproduce this issue with Cloud Foundry which uses hcsshim as part of the https://github.com/cloudfoundry/winc component and we also see the same behavior using Docker, such as:
docker run -d -p 8080:80 --name aspnet microsoft/aspnet
It appears that this is a fundamental limitation with WinNAT / HNS / WFP but we aren't sure if some combination of settings can make this work.
The test:
https://github.com/Microsoft/hcsshim/blob/master/functional/uvm_scsi_test.go#L20
Ends up hanging in the UVM remove path when it tries to remove SCSI /0
for the 2nd time. The sympotms seem to be less frequent if you lower the test case size but this a bug somewhere.
To repro just run the test but it seems that its the 2nd add/remove that causes the issues.
Having casing issues while building a project that uses hcsshim, when build is done in linux.
For WCOW and LCOW we should actually make the SignalProcess call when a signal comes in. On Windows we can do this if the capability supports: SignalProcessSupported = true
I was using wclayer
to import layers. It seems that if the target directory has a long path, wclayer
does not work properly.
I was importing the base layer from microsoft/nanoserver
. The layer can be downloaded from https://az896309.vo.msecnd.net/containers/microsoft/nanoserver:10.0.14393.447_en-us_full_spx2q06478MkUJl5hOPmCKDgYiXoSLt3. When I imported it with
.\wclayer.exe import -i .\nanoserver_10.0.14393.447_en-us_full_spx2q06478MkUJl5hOPmCKDgYiXoSLt3 C:\temp_short
the extracted folder has 1.13 GB. However, if I imported it with
.\wclayer.exe import -i .\nanoserver_10.0.14393.447_en-us_full_spx2q06478MkUJl5hOPmCKDgYiXoSLt3 C:\temp_long\abcdef\ghijklmn\opqrst\uvwxyz\abcd\efghijk\lmnopq\rst\uvwxyz\abcdefghijklmn\opqrstuvwxyz
the extracted folder has only 81.3 MB.
I have also tried to add long path prefix:
.\wclayer.exe import -i .\nanoserver_10.0.14393.447_en-us_full_spx2q06478MkUJl5hOPmCKDgYiXoSLt3 \\?\C:\temp_long\abcdef\ghijklmn\opqrst\uvwxyz\abcd\efghijk\lmnopq\rst\uvwxyz\abcdefghijklmn\opqrstuvwxyz
wclayer
failed with mkdir \\?: The filename, directory name, or volume label syntax is incorrect.
, so I created the directory first and ran the command again. The extracted folder also has only 81.3 MB. Therefore, I suspect that if the target path is too long, some files get ignored during the import.
DiskNumber: 0 and UEFIDevice: VMBFS are common between Windows and Linux paths. Combine. uvm./create.go L186
New code is being added that uses the term “sandbox” to refer to the top layer of a layer-based storage system. This term is not used outside of Microsoft and may be confused with a Kubernetes sandbox. We should replace this term with upper or writable layer.
Issue seems similar to #95
There are two test cases and two different errors, but I think the underlying root cause may be the same, that's why I report both errors in the same issue.
Here's the first test case:
for i := 0; i < numTries; i++ {
subnets := []hcsshim.Subnet{
{
AddressPrefix: "10.0.0.0/24",
GatewayAddress: "10.0.0.1",
},
}
configuration := &hcsshim.HNSNetwork{
Type: "transparent",
NetworkAdapterName: "Ethernet0",
Subnets: subnets,
}
configBytes, _ := json.Marshal(configuration)
resp, err := hcsshim.HNSNetworkRequest("POST", "", string(configBytes))
Expect(err).ToNot(HaveOccurred())
_, err = net.Dial("tcp", "10.7.0.54:8082")
Expect(err).ToNot(HaveOccurred())
// sometimes errors:
// `dial tcp localhost:80: connectex: A socket operation was attempted to an
// unreachable network.`
hcsshim.HNSNetworkRequest("DELETE", resp.Id, "")
}
net.Dial sometimes fails with:
dial tcp 10.7.0.54:8082: connectex: A socket operation was attempted to an unreachable network.
Powershell script that kinda replicates it (but prints a different error):
1..50 | % { New-ContainerNetwork -Mode transparent -Name net1 -SubnetPrefix 10.0.0.0/24 -NetworkAdaptername Ethernet0; curl 10.7.0.54:8082; Remove-ContainerNetwork -Name net1 -Force; }
I sometimes get error:
curl : Unable to connect to remote server
which I believe to be a coarse error message encompassing aforementioned connectex...
error.
Here's the second test case, which is very similar to the first one but we don't specify a subnet when creating the HNS network:
for i := 0; i < numTries; i++ {
configuration := &hcsshim.HNSNetwork{
Type: "transparent",
NetworkAdapterName: "Ethernet0",
}
configBytes, _ := json.Marshal(configuration)
resp, err := hcsshim.HNSNetworkRequest("POST", "", string(configBytes))
Expect(err).ToNot(HaveOccurred())
// sometimes errors:
// `HNS failed with error : Unspecified error`
hcsshim.HNSNetworkRequest("DELETE", resp.Id, "")
}
This time, we get a HNS error when invoking POST request on HNS:
HNS failed with error : Unspecified error
To replicate via powershell:
1..50 | % { New-ContainerNetwork -Mode transparent -Name net1 -NetworkAdaptername Ethernet0; Remove-ContainerNetwork -Name net1 -Force; }
Which sometimes returns:
New-ContainerNetwork : Unspecified Error
We've noticed that when we are creating many containers and attaching endpoints to them in parallel, we get an error from HotAttachEndpoint
. The message is just "Element not found." It's not clear whether that is the container, the endpoint, or something else.
We haven't been able to find a way to recover from this error without destroying the container and endpoint and trying again. The state of an endpoint before attach appears identical whether the attach fails or not. So is the state after the attach (even if it fails). Retrying the attach does not work, nor does attempting to use the endpoint.
Here is a reproduction of the issue. For us, running this with 50 container creates in parallel usually fails on the first or second try. It's worth nothing that we very rarely see this issue if we are only doing a few creates in parallel.
Should probably use the binary name instead.
binaryName = filepath.Base(os.Args[0]) // With extension
binaryName = strings.TrimSuffix(binaryName, filepath.Ext(binaryName)) // Without extension
This field is a string only documented as Credentials information
.
What is this field supposed to signify? How do you populate it?
Can we use this field to set up credentials for the users inside a container?
Hi,
When HNS cannot create network it gives not very detailed error:
Error response from daemon: HNS failed with error : Failed to create network
There can be many reasons (only one NAT network can exists, problem with net adapter, etc.)
Is it possible to provide more details in it? It would be helpful, especially for users beginning with Docker.
Right now we have both properties, but SandboxPath is deprecated in favor of LayerPath. It's time to remove SandboxPath (and then update Docker on revendor) so that we don't keep using it.
Could nil dereference if passed options is nil.
This is just a thought that I'm posting as an issue to keep track of it.
Currently, in order to determine if an error is caused by a missing resource in hcsshim, clients need to take a dependency on hcsshim and use hcsshim.IsNotFound()
. This is better than raw errors and error strings, but I think it could be done in a more scale-able way.
I wonder if we could do this in a more generic way in the future, such that clients who want to determine if an error is caused by a resource not existing do not need to take a dependency on hcsshim (assuming the hcsshim error filters a few layers, or even projects, above where it is read).
I think this would be best done with something like https://github.com/pkg/errors for Cause wrapping instead of my current implementation of getInnerError
and an unexported (but guaranteed stable) isNotFound interface with some method such as .NotFound() and errors which implement that method. Clients could then just define the same interface in their code and at runtime determine if the error is caused by a missing resource without needing to know which underlying package caused the error.
This would be best defined outside hcsshim so that other platform layers could implement the same interface, and it is not just Microsoft/* that uses it. I don't have the time to look into if something like this already exists in Go, but storing this to come back to it at a later date. If not, pkg/errors might want to host something like this.
I am unable to run windows containers or Linux containers on Windows 10 Pro latest version. The windows is already logged, The error i get when switch docker to use Linux is:
Docker hv-sock proxy (vsudd) is not reachable
at Docker.Backend.ContainerEngine.Linux.ConnectToVsud(TaskCompletionSource`1 vmId) in C:\gopath\src\github.com\docker\pinata\win\src\Docker.Backend\ContainerEngine\Linux.cs:line 293
at Docker.Backend.ContainerEngine.Linux.DoStart(Settings settings, String daemonOptions) in C:\gopath\src\github.com\docker\pinata\win\src\Docker.Backend\ContainerEngine\Linux.cs:line 260
at Docker.Backend.ContainerEngine.Linux.Start(Settings settings, String daemonOptions) in C:\gopath\src\github.com\docker\pinata\win\src\Docker.Backend\ContainerEngine\Linux.cs:line 130
at Docker.Core.Pipe.NamedPipeServer.<>c__DisplayClass9_0.b__0(Object[] parameters) in C:\gopath\src\github.com\docker\pinata\win\src\Docker.Core\pipe\NamedPipeServer.cs:line 47
at Docker.Core.Pipe.NamedPipeServer.RunAction(String action, Object[] parameters) in C:\gopath\src\github.com\docker\pinata\win\src\Docker.Core\pipe\NamedPipeServer.cs:line 145
This is re this thread: https://github.com/Microsoft/hcsshim/pull/276#issuecomment-411589251
.
We have a two part change between go-winio
and hcsshim
.
go-winio
change is here: microsoft/go-winio#88 (comment)
hcsshim
change is here: #276 (comment)
Note that there are also separate changes here:
hcsshim
: #277 (comment)
opengcs
: microsoft/opengcs#243 (comment)
These should be also submitted before we release and revendor, so we do not have to do this twice.
We have found that DNS lookups occasionally fail when a process is executed in a newly created container. Here is a short go program which can be used to reproduce the issue:
$env:ROOTFS_PATH=(docker inspect microsoft/windowsservercore:1709 | ConvertFrom-Json).GraphDriver.Data.Dir
$env:NETWORK_NAME="nat"
for ($i=0; $i -lt 20; $i++) {./main.exe }
The DNS lookup from inside the container will fail 0-6 times out of 20.
If we use a container image that has the DNSCache service turned off, the DNS lookup always succeeds. The Dockerfile we use for this is:
FROM microsoft/windowsservercore:1709
RUN powershell.exe -command "Set-ItemProperty -Path 'HKLM:\SYSTEM\CurrentControlSet\Services\dnscache' -Name Start -Value 4"
Unfortunately I am again encountering "hcsshim::ImportLayer failed in Win32: The system cannot find the path specified. (0x3)" on Windows Server 1803 with Docker 18.03.1-ee-3:
re-exec error: exit status 1: output: time="2018-11-07T21:14:46+01:00" level=error msg="hcsshim::ImportLayer failed in Win32: The system cannot find the path specified. (0x3) layerId=\\\\?\\C:\\ProgramData\\docker\\windowsfilter\\dbb8fe7c9ff32c1c36b7efea1786dc52d1bd420849248300536adc27dd5189e8 flavour=1 folder=C:\\ProgramData\\docker\\tmp\\hcs132418239"
hcsshim::ImportLayer failed in Win32: The system cannot find the path specified. (0x3) layerId=\\?\C:\ProgramData\docker\windowsfilter\dbb8fe7c9ff32c1c36b7efea1786dc52d1bd420849248300536adc27dd5189e8 flavour=1 folder=C:\ProgramData\docker\tmp\hcs132418239
Client:
Version: 18.06.1-ce
API version: 1.37 (downgraded from 1.38)
Go version: go1.10.3
Git commit: e68fc7a
Built: Tue Aug 21 17:23:18 2018
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 18.03.1-ee-3
API version: 1.37 (minimum version 1.24)
Go version: go1.10.2
Git commit: b9a5c95
Built: Thu Aug 30 18:56:49 2018
OS/Arch: windows/amd64
Experimental: false
It happens during the process-level isolation build of this Dockerfile:
# escape=`
ARG BASE_TAG=1803
FROM microsoft/windowsservercore:${BASE_TAG}
SHELL ["powershell", "-command"]
RUN Invoke-WebRequest "https://go.microsoft.com/fwlink/p/?linkid=870807" -OutFile "C:\Windows\Temp\winsdksetup.exe"; `
Start-Process -FilePath "C:\Windows\Temp\winsdksetup.exe" -ArgumentList /Quiet, /NoRestart -NoNewWindow -PassThru -Wait; `
Remove-Item @('C:\Windows\Temp\*', 'C:\Users\*\Appdata\Local\Temp\*') -Force -Recurse; `
Write-Host 'Checking INCLUDE ...'; `
Get-Item -Path 'C:\Program Files (x86)\Windows Kits\10\Include\10.0.17134.0\shared'; `
Get-Item -Path 'C:\Program Files (x86)\Windows Kits\10\Include\10.0.17134.0\um'; `
Get-Item -Path 'C:\Program Files (x86)\Windows Kits\10\Include\10.0.17134.0\ucrt';
RUN Write-Host 'Updating INCLUDE ...'; `
$env:INCLUDE = 'C:\Program Files (x86)\Windows Kits\10\Include\10.0.17134.0\shared;' + $env:INCLUDE; `
$env:INCLUDE = 'C:\Program Files (x86)\Windows Kits\10\Include\10.0.17134.0\um;' + $env:INCLUDE; `
$env:INCLUDE = 'C:\Program Files (x86)\Windows Kits\10\Include\10.0.17134.0\ucrt;' + $env:INCLUDE; `
[Environment]::SetEnvironmentVariable('INCLUDE', $env:INCLUDE, [EnvironmentVariableTarget]::Machine);
CMD ["powershell"]
The base image is the following:
microsoft/windowsservercore 1803 1a4a9d0fd8af 4 weeks ago 4.93GB
@thaJeztah @jhowardmsft @jterry75 Would you mind taking another look at this? The previous issue was moby/moby#32838.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.