We are facing IO Processing slowness on an Amazon EC2 server while dealing with huge data files.
To enhance performance of IO processing on Amazon EC2 server while dealing with huge data files that need faster processing time.
Following processes affect IO processing:
- When the file data source is being built or refreshed using large CSV or Excel file and especially while updating or creating the dataset for the same. In this scenario, lot if IO process takes place on the server.
- When dataset or an existing front-end object such as reports is read or analysed, or when we create new front-end objects, many IO process threads take place while files are read and updated in the data folder.
- High number of concurrent users using the System
So, in such scenarios, better throughput for IO is needed, and the following is a proposed solution when Smarten is hosted in an AWS instance.
In a generic setup, on an Amazon EC2 server, we mount Smarten data folder in EFS (Elastic file system).
While using Smarten on a regular basis especially in case where we keep creating new datasets or adding new data into Smarten on periodic basis, gradually more space will be occupied, data volumes will increase and hence data rendering process will take more and more time based on data increase.
The problem arises when we have fixed CPU resources which is a default case for any newly setup Amazon EC2 instance. The solution is to change the Throughput mode of the instance.
What is Throughput?
Throughput is the amount of data that can be transmitted over a network or processed by a device within a certain amount of time. In the context of input/output (IO), throughput refers to the speed at which data can be read from or written to a storage device.
In general, higher throughput means that IO operations can be completed more quickly, as more data can be transferred in a given amount of time. This can lead to improved performance for applications that rely heavily on IO, such as databases, file servers, video editing software and Smarten.
- Factors affecting impact of throughput on IO:
- Disk performance: The performance of the storage device itself can impact the maximum achievable throughput. If the disk is slow, it may not be able to keep up with high levels of IO traffic, even if the network or device has high throughput capabilities.
- Network latency: In some cases, high throughput may not translate to faster IO if there is significant latency in the network. Latency is the amount of time it takes for data to travel from one point to another, and high latency can lead to delays in IO operations.
- Congestion: When multiple devices or applications are competing for the same IO resources, throughput can be reduced due to congestion. This can lead to slower IO operations, even if the network or device has high throughput capabilities.
Overall, while higher throughput generally means faster IO, it's important to consider other factors that can impact IO performance.
To solve the problem in cases where the volume of data that we are going to deal with can be variable and may increase exponentially, we can choose to go with a different throughput mode.
- Modes of ESF Throughput
There are three modes for EFS throughput:
- Bursting mode allows you to utilize resources above your EFS base limit for up to 12 hours per day based on the signed up scale setting.
- Pros of using Bursting mode
- Provides high throughput performance
- Best for short periods of time, without any additional cost
- Ideal for workloads with unpredictable or infrequent spikes
- Maintains ideal balance between cost and performance
- Cons of using Bursting mode:
- Baseline throughput of 50 MiB/s which may not be sufficient for some workloads.
- Maximum burst throughput level is limited to 100 MiB/s
- Not suitable for workloads that require sustained high throughput for extended periods of time.
- For an extended period, you may be charged additional fees.
- Elastic mode can decide based on the activity and based on given region and space limit, it automatically adjusts while the IO is being taken care of. The charges are based on the utilization.
- Pros of using Elastic Mode:
- High Performance
- Cons of using Elastic Mode:
- Limited Bursting
- Increased Latency
- Complex Configuration
- Provisioned mode allows you to read/write at three times the speed and with higher processing charges for minimum of 3GiBPS read speed and 1 GiBPS write speed in parallel hence giving us ENHANCED performance.
- Pros of using Provisioned Mode:
- Increased Performance
- Predictable Performance
- Cost Savings
- Cons of using Provisioned Mode:
- Higher Cost
- Limited Scalability
- Limited Flexibility
For an example, if your throughput is in bursting mode and its usage is more than 80% of the permitted throughput or you have used all of your burst credits, then slowness issue in data reading occurs.
To resolve this, we can change the throughput mode which would help us to enhance the data rendering performance.
How to Change the Throughput Mode of an AWS Instance?
Please refer to the following screenshot in your EFS setting to change the throughput mode to Enhanced (Provisioned) Throughput mode. (In this case we have our default mode as Bursting mode)
Once the high workload process is done then you can change the Enhance mode again to Bursting Mode as per your requirement.
Note: For detailed information, refer the following links: