eksctl で 既存の VPC にプライベートなワーカーノードが作れない
こんな感じでずっと待ち状態になってしまう原因が知りたかった。
eksctl create nodegroup \ --region ap-northeast-1 \ --cluster ekstest \ --name ng0 \ --node-type t3.medium \ --nodes 1 --nodes-min 1 --nodes-max 1 --node-ami auto --node-volume-size 10 \ --node-private-networking \ --node-security-groups sg-0d021a3fc762eed89 \ --node-labels "usage=client" \ --ssh-access \ --ssh-public-key .ssh/mymachine.pub [ℹ] using region ap-northeast-1 [ℹ] will use version 1.14 for new nodegroup(s) based on control plane version [ℹ] nodegroup "ng0" will use "ami-055d09694b6e5591a" [AmazonLinux2/1.14] [ℹ] using SSH public key ".ssh/mymachine.pem.pub" as "eksctl-ekstest-nodegroup-ng0-86:8d:7f:00:97:c2:c8:19:af:94:61:03:72:c8:31:51" [ℹ] 1 nodegroup (ng0) was included [ℹ] will create a CloudFormation stack for each of 1 nodegroups in cluster "ekstest" [ℹ] 1 task: { create nodegroup "ng0" } [ℹ] building nodegroup stack "eksctl-ekstest-nodegroup-ng0" [ℹ] deploying stack "eksctl-ekstest-nodegroup-ng0" [ℹ] adding role "arn:aws:iam::1234567890:role/eksctl-ekstest-nodegroup-ng0-NodeInstanceRole-1F01MBBM84FH5" to auth ConfigMap [ℹ] nodegroup "ng0" has 0 node(s) [ℹ] waiting for at least 1 node(s) to become ready in "ng0"
待ち状態になってから作成したノードに ssh
でログインしていろいろ確認したら理由がわかった。
まず、kubelet.service
がクラッシュループを繰り返していることを確認。
どうやら aws クラウドプロバイダーの初期化に失敗しているようだ。
journalctl -u kubelet.service Sep 24 00:41:30 ip-192-168-10-221.ap-northeast-1.compute.internal systemd[1]: Starting Kubernetes Kubelet... Sep 24 00:41:30 ip-192-168-10-221.ap-northeast-1.compute.internal systemd[1]: Started Kubernetes Kubelet. Sep 24 00:41:30 ip-192-168-10-221.ap-northeast-1.compute.internal kubelet[3619]: Flag --max-pods has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://ku Sep 24 00:41:30 ip-192-168-10-221.ap-northeast-1.compute.internal kubelet[3619]: Flag --allow-privileged has been deprecated, will be removed in a future version Sep 24 00:41:30 ip-192-168-10-221.ap-northeast-1.compute.internal kubelet[3619]: Flag --max-pods has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://ku Sep 24 00:41:30 ip-192-168-10-221.ap-northeast-1.compute.internal kubelet[3619]: Flag --allow-privileged has been deprecated, will be removed in a future version Sep 24 00:41:30 ip-192-168-10-221.ap-northeast-1.compute.internal kubelet[3619]: I0924 00:41:30.498199 3619 server.go:418] Version: v1.14.6-eks-5047ed Sep 24 00:41:30 ip-192-168-10-221.ap-northeast-1.compute.internal kubelet[3619]: W0924 00:41:30.499460 3619 plugins.go:118] WARNING: aws built-in cloud provider is now deprecated. The AWS provider is deprecated and will Sep 24 00:41:30 ip-192-168-10-221.ap-northeast-1.compute.internal kubelet[3619]: I0924 00:41:30.501778 3619 aws.go:1137] Zone not specified in configuration file; querying AWS metadata service Sep 24 00:41:30 ip-192-168-10-221.ap-northeast-1.compute.internal kubelet[3619]: I0924 00:41:30.507431 3619 aws.go:1171] Building AWS cloudprovider Sep 24 00:43:30 ip-192-168-10-221.ap-northeast-1.compute.internal kubelet[3619]: F0924 00:43:30.852164 3619 server.go:266] failed to run Kubelet: could not init cloud provider "aws": error finding instance i-01c729a3a7fe Sep 24 00:43:30 ip-192-168-10-221.ap-northeast-1.compute.internal systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a Sep 24 00:43:30 ip-192-168-10-221.ap-northeast-1.compute.internal systemd[1]: Unit kubelet.service entered failed state. Sep 24 00:43:30 ip-192-168-10-221.ap-northeast-1.compute.internal systemd[1]: kubelet.service failed. Sep 24 00:43:36 ip-192-168-10-221.ap-northeast-1.compute.internal systemd[1]: kubelet.service holdoff time over, scheduling restart. Sep 24 00:43:36 ip-192-168-10-221.ap-northeast-1.compute.internal systemd[1]: Starting Kubernetes Kubelet... Sep 24 00:43:36 ip-192-168-10-221.ap-northeast-1.compute.internal systemd[1]: Started Kubernetes Kubelet. Sep 24 00:43:36 ip-192-168-10-221.ap-northeast-1.compute.internal kubelet[3836]: Flag --max-pods has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://ku Sep 24 00:43:36 ip-192-168-10-221.ap-northeast-1.compute.internal kubelet[3836]: Flag --allow-privileged has been deprecated, will be removed in a future version Sep 24 00:43:36 ip-192-168-10-221.ap-northeast-1.compute.internal kubelet[3836]: Flag --max-pods has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://ku Sep 24 00:43:36 ip-192-168-10-221.ap-northeast-1.compute.internal kubelet[3836]: Flag --allow-privileged has been deprecated, will be removed in a future version Sep 24 00:43:36 ip-192-168-10-221.ap-northeast-1.compute.internal kubelet[3836]: I0924 00:43:36.082885 3836 server.go:418] Version: v1.14.6-eks-5047ed Sep 24 00:43:36 ip-192-168-10-221.ap-northeast-1.compute.internal kubelet[3836]: W0924 00:43:36.083072 3836 plugins.go:118] WARNING: aws built-in cloud provider is now deprecated. The AWS provider is deprecated and will Sep 24 00:43:36 ip-192-168-10-221.ap-northeast-1.compute.internal kubelet[3836]: I0924 00:43:36.083155 3836 aws.go:1137] Zone not specified in configuration file; querying AWS metadata service Sep 24 00:43:36 ip-192-168-10-221.ap-northeast-1.compute.internal kubelet[3836]: I0924 00:43:36.084224 3836 aws.go:1171] Building AWS cloudprovider Sep 24 00:45:36 ip-192-168-10-221.ap-northeast-1.compute.internal kubelet[3836]: F0924 00:45:36.401506 3836 server.go:266] failed to run Kubelet: could not init cloud provider "aws": error finding instance i-01c729a3a7fe Sep 24 00:45:36 ip-192-168-10-221.ap-northeast-1.compute.internal systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a Sep 24 00:45:36 ip-192-168-10-221.ap-northeast-1.compute.internal systemd[1]: Unit kubelet.service entered failed state. Sep 24 00:45:36 ip-192-168-10-221.ap-northeast-1.compute.internal systemd[1]: kubelet.service failed. Sep 24 00:45:41 ip-192-168-10-221.ap-northeast-1.compute.internal systemd[1]: kubelet.service holdoff time over, scheduling restart.
そして cloud-init.service
が失敗していることも確認。
yum リポジトリのアクセスに失敗している。
原因は NAT ゲートウェイも NAT インスタンスも作ってなかったから。
なるほど :thinking_face: すぐに課金始まるから作ってなかったんだよね・・・
systemctl list-units | grep cloud-config ● cloud-config.service loaded failed failed Apply the settings specified in cloud-config journalctl -u cloud-config.service Sep 24 00:40:52 ip-192-168-10-221.ap-northeast-1.compute.internal systemd[1]: Starting Apply the settings specified in cloud-config... Sep 24 00:40:52 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: Cloud-init v. 18.2-72.amzn2.0.7 running 'modules:config' at Tue, 24 Sep 2019 00:40:52 +0000. Up 12.19 seconds. Sep 24 00:40:52 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: Loaded plugins: priorities, update-motd Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: One of the configured repositories failed (Unknown), Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: and yum doesn't have enough cached data to continue. At this point the only Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: safe thing yum can do is fail. There are a few ways to work "fix" this: Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: 1. Contact the upstream for the repository and get them to fix the problem. Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: 2. Reconfigure the baseurl/etc. for the repository, to point to a working Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: upstream. This is most often useful if you are using a newer Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: distribution release than is supported by the repository (and the Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: packages for the previous distribution release still work). Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: 3. Run the command with the repository temporarily disabled Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: yum --disablerepo=<repoid> ... Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: 4. Disable the repository permanently, so yum won't use it by default. Yum Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: will then just ignore the repository until you permanently enable it Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: again or use --enablerepo for temporary usage: Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: yum-config-manager --disable <repoid> Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: or Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: subscription-manager repos --disable=<repoid> Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: 5. Configure the failing repository to be skipped, if it is unavailable. Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: Note that yum will try to contact the repo. when it runs most commands, Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: so will have to try and fail each time (and thus. yum will be be much Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: slower). If it is a very temporary problem though, this is often a nice Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: compromise: Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: Cannot find a valid baseurl for repo: amzn2-core/2/x86_64 Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal systemd[1]: cloud-config.service: main process exited, code=exited, status=1/FAILURE Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: Could not retrieve mirrorlist http://amazonlinux.ap-northeast-1.amazonaws.com/2/core/latest/x86_64/mirror.list error was Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: 12: Timeout on http://amazonlinux.ap-northeast-1.amazonaws.com/2/core/latest/x86_64/mirror.list: (28, 'Connection timed out after 5000 mill Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: Sep 24 00:41:29 cloud-init[3052]: util.py[WARNING]: Package upgrade failed Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: Sep 24 00:41:29 cloud-init[3052]: cc_package_update_upgrade_install.py[WARNING]: 1 failed with exceptions, re-raising the last one Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal cloud-init[3052]: Sep 24 00:41:29 cloud-init[3052]: util.py[WARNING]: Running module package-update-upgrade-install (<module 'cloudinit.config.cc_package_upd Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal systemd[1]: Failed to start Apply the settings specified in cloud-config. Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal systemd[1]: Unit cloud-config.service entered failed state. Sep 24 00:41:29 ip-192-168-10-221.ap-northeast-1.compute.internal systemd[1]: cloud-config.service failed.