After over 7 months of development since 1.0.2 and massive architectural, design, and re-factoring overhauls we’re proud to announce the next-generation QUADS 1.1.0 codenamed gaúcho. A monumental amount of enhancements, fixes and redesign efforts going back over a year form the foundation of the 1.1 series.
Changes in QUADS 1.1.0
We’ve got a a slew of changes, enhancements and fixes in QUADS 1.1.0, namely the following:
- Complete move to Python3.6+
- Move to asyncio for provisioning concurrency
- MongoDB database backend replaces flat YAML files
- MongoEngine drives object document mapping
- CherryPy Python Web Framework for API
- Enormous performance improvements (see below)
- Massive re-factoring and code structure improvements
- All shell tools rewritten in Python or made proper libraries
- Expect network automation ported to PyExpect
- Foreman provisioning is now done concurrently across all systems
- Systems/Network provisioning time improvement per system of 34%
- Lots and lots of bug fixes and other improvements
Code Changes
- 323 commits reviewed and merged since 1.0.2
- 68 total issues closed and many other countless fixes
- Over 35 bugs fixed
- More than 25 feature enhancements
Performance and Speed Improvements
With the inclusion of asyncio we’ve gained massive increases in provisioning time due to the concurrency of the move and rebuild operation of QUADS.
Below is a comparison of QUADS 1.0.2 versus 1.1.0 when provisioning a set of three systems/networks.
NOTE: in the below graphs, a system/network unit means the following:
- QUADS cloud membership changes for 1 x bare-metal server
- Network port changes for 4 x internal interfaces for 1 x server (VLAN)
- IPMI and Foreman environment credential and RBAC changes per server
This does not include the 3 to 7 minutes it takes for all server operating systems to re-install via Foreman/kickstart and reboot.
This is now done concurrently across all hosts due to asyncio versus this operation previously being done serially in 1.0.2 yielding some incredible returns in efficiency the more systems you schedule (e.g. 3-7 minutes for all systems versus 3-7 minutes per system).
We seen an average decrease in systems/network provisioning time by around 34% per host in the 1.1.0 version of QUADS compared to 1.0.2.
Faster Provisioning Across Increasing Sets of Hosts
One interesting improvement is that the systems/network provisioning time decreases dramatically the more hosts QUADS operates on as the systems/network count of assignments is increased up to a certain point. You can see the linear curve improvements based on overall time across different sized sets of systems below.
From our initial testing the sweet spot seems to be with assignments of around 30 systems/networks. This brings the average provisioning time down to around 30 seconds per host, down from 89 seconds in a 3 to 5 host cloud size.
We believe that the efficiency curve would continue past environments beyond 30 systems/networks however but we ran into a Foreman bug that is causing API failures past this amount, incurring asyncio to requeue the tasks again until it finishes.
We have implemented a client-side workaround to this issue for now by limiting the amount of asyncio semaphores for POST/PUT API requests until it’s resolved so scale-up performance of larger sets of hosts will undoubtedly be better, though it’s still a monumental improvement.
When this is addressed we’ll do more testing (and probably just publish a new post) on further performance/scale testing of QUADS 1.1.
Further Performance Benchmarks between Asyncio and Legacy
Below you can see a further detailed comparsion between larger sets of hosts from the 1.0 QUADS versus the asyncio-enabled 1.1 series.
Here is a another view of systems/network provisioning run-times across sized groups up to 90 using QUADS 1.1 with async concurrency.
You can find a much more detailed write-up of using asyncio in Python, and our challenges and approach here.
You can find QUADS 1.1.0 release information and packages over on Github.