2

I have the following configuration in Alibaba ECS:

Public Connector and Three Test Nodes

Connector has network connections on the public internet and the default VSwitch in the default VPC. Connector was created using the ECS web interface. The testnode[0-2] machines were created in a script using the Alibaba cli command: aliyun.

When the instances start running, the connector can ping none of them. If I set a password on any of the test nodes, and then restart the test node, ping starts working. The script uses a snapshot of the Connector as the image for the test nodes. The ```Connector`` has a randomly generated, long, and forgotten root password. Root access is via ssh with a passphrase protected key pair. It also has the same for a non-root user for the test code.

What I have tried is creating test nodes with the following CreateInstance options:

  1. No --Password and no --InheritPassword options (original intent: why set a password? I have the access I need from the Connector image)

  2. --InheritPassword option (I need a root password in order for the private network interfaces to work, the root password in the Connector image is fine)

  3. --Password option (I need to explicitly set a root password on the test nodes)

The result is all the same, until I use the ECS web interface to set a password and restart a test node, Console cannot ping the test nodes.

What I know:

  1. This is not a problem with the default security group, VPC, or VSwitch as I touch no settings on these entities in order for ping to work.

  2. This is not a problem with the instance image because as soon as ping works, ssh to the test nodes works as well.

What I am doing wrong, or what am I missing? The whole purpose is to spin up instances without having to type away at the ECS web interface. I figured out what it took to get the private network traffic moving because I wanted to debug the situation on the test nodes, and for that, I had to set a root password and gain access from the ECS web console, which again, defeats the purpose of scripting.

Aliyun command for creating the test nodes:

aliyun ecs CreateInstance --ImageId m-2vchb2oxldfuloh51wp9 --RegionId=cn-chengdu --InstanceType=ecs.c6.xlarge --SpotStrategy SpotWithPriceLimit --SpotPriceLimit 0.25 --ZoneId cn-chengdu-a --InternetChargeType PayByTraffic --InternetMaxBandwidthOut 99 --InstanceName TEST_NODE-0 --HostName testnode0 --Password 'notgoingtotellyou'

Operating system for all instances is Ubuntu 18.0.4.

Aliyun command version is 3.0.30.

Selaka Nanayakkara
  • 2,198
  • 14
  • 30

1 Answers1

0

I got two answers. One from a co-worker. One from Alibaba.

Co-worker's answer: The configuration fails because the Unbuntu 18.0.4 image that I created for the non-public test machines used a static address for the internal network interface. I changed the internal network interface (eth0) to use dhcp and all worked. See netplan configuration examples for how to change the IP address assignment.

Alibaba's answer: Try using aliyun ecs RunInstances instead of three individual aliyun ecs CreateInstance and aliyun ecs StartInstance invocations. I did not try this solution as it would have involved rewriting my scripts. Alibaba could have done more to motivate me by providing an explanation as to why RunInstances would produce a different result than the combination of CreateInstance and StartInstance.