Optimizing your scheduled jobs

Introduction
Step 1: Deciding the expected volumes of data
Step 2: Functional and technical limitations
Step 3: Setting the schedules

Introduction

The majority of the integrations built using Alumio are asynchronous. This means that data is fetched and processed in the background. Because of this reason, most of your integrations will require you to configure one or more scheduled jobs. Scheduled jobs instruct Alumio to execute a process at a set schedule.

Every execution of a scheduled job requires the processing time of your Alumio iPaaS environment. Depending on your plan, you might face performance or stability issues when you schedule many jobs at the same moment, or if your jobs are executing large pieces of work.

This requires you to be thoughtful when setting up your scheduled jobs, as you want to make sure that:

Your integration meets the data requirements (usually driven by data volatility and/or customer needs)
Your scheduled jobs do not consume the total available processing power of your iPaaS environment.

This article will give you some tips on how to decide schedules that work best for your integration.

Step 1: Deciding the expected volumes of data

The first question you should be asking yourself (or your customer) is the number of entities one should expect during any given day. This is usually highly dependant on the area your business operates in, the entity being integrated, and many other factors such as seasonal events.

For the purpose of this article, we have decided on the following (fictional) case:

New products from ERP
- ~15 products per day
Updated products from ERP
- 200 per day
Prices from ERP
- 35000 per day
Stock from ERP
- 21000 per day
Orders from Webshop
- 180 per day

Now that we know the expected volumes for each integration, we can move on to the next step.

Step 2: Functional and technical limitations

During this step, it is important to understand the possibilities, but more importantly, the limitations of the systems being integrated.

Let's zoom into the bespoke prices integration. According to our requirements, the expected number of price changes is 35000 per day. Given the fact that the highest frequency a scheduled job can run is once per minute, this means that every day there are 1440 moments to process (a portion) the price changes. This means that at least 25 price changes need to be forwarded every minute in order to meet the expected volume.

However, we also need to consider the source and destination systems in this equation. In our case, the source system is able to provide 100 price changes per minute, while the receiving end is able to process 500 price changes per minute. Taking these limitations in mind, Alumio would need 350 processes (or scheduled jobs) to retrieve all data and require 70 processes (or scheduled jobs) to process all data.

With this information, the schedules can be defined.

Step 3: Setting the schedules

There are many ways on deciding the schedules. One of them is by taking the information at hand:

350 scheduled jobs executions required to pull all data
70 scheduled jobs executions required to process all data
1440 moments where a scheduled jobs can be run per day - please note that you may run multiple scheduled jobs at the same moment

Ideally, you want to spread the load. Simply take divide the amount of scheduled jobs executions required by the hours each day and you will have a good idea how often you need to run the scheduled job each hour. In our case, this would be 15 scheduled jobs each hour for pulling the data and 3 scheduled jobs each hour for processing the data. The next step would be to divide that amount by the number of minutes in an hour (60) and you can decide the schedule:

Pulling the data
- 350 scheduled jobs per day / 24 hours = 15 scheduled jobs per hour
- 60 minutes / 15 scheduled jobs per hour = 1 scheduled job every 4 minutes
- Final schedule = */4 * * * *
Processing the data
- 70 scheduled jobs per day / 24 hours = 3 scheduled jobs per hour
- 60 minutes / 3 scheduled jobs per hour = 1 scheduled job every 20 minutes
- Final schedule = */20 * * * *

Depending on the number of integrations in your environment you might experience overlap in the required schedules. This could lead to performance and stability issues, as you are limited by the processing power of your environment and the amount of data going through the integrations. To prevent many scheduled jobs from running simultaneously it's advised to introduce an offset like so:

Scheduled job 1 at: */3 * * * *
- This would cause it to run at: 0th minute, 3rd minute, 6th minute, etc
Scheduled job 2 at: 1-59/3 * * * *
- This would cause it to run at: 1th minute, 4rd minute, 7th minute, etc
Scheduled job 3 at: 2-59/3 * * * *
- This would cause it to run at: 2th minute, 5rd minute, 8h minute, etc

The knowledge base has moved to a new location!