PerformERL: a performance testing framework for erlang

Walter Cazzola; Francesco Cesarini; Luca Tansini

doi:10.1007/s00446-022-00429-7

PerformERL: a performance testing framework for erlang

Cazzola, Walter; Cesarini, Francesco; Tansini, Luca 2022-10-01 00:00:00 The Erlang programming language is used to build concurrent, distributed, scalable and resilient systems. Every component of these systems has to be thoroughly tested not only for correctness, but also for performance. Performance analysis tools in the Erlang ecosystem, however, do not provide a sufﬁcient level of automation and insight needed to be integrated in modern tool chains. In this paper, we present PerformERL: an extendable performance testing framework that combines the repeatability of load testing tools with the details on how the resources are internally used typical of the performance monitoring tools. These features allow PerformERL to be integrated in the early stages of testing pipelines, providing users with a systematic approach to identifying performance issues. This paper introduces the PerformERL framework, focusing on its features, design and imposed monitoring overhead measured through both theoretical estimates and trial runs on systems in production. The uniqueness of the features offered by PerformERL, together with its usability and contained overhead prove that the framework can be a valuable resource in the development and maintenance of Erlang applications. Keywords Erlang · Distributed systems · Performance testing · Load testing · Performance monitoring 1 Introduction As discussed by Jiang and Hassan [21], ﬁxing these issues when in production becomes complicated and expensive. Erlang offers a set of features—such as share-nothing Several language agnostic tools are available to measure lightweight processes and asynchronous communication the throughput and latency of a system under test (SUT) by through message passing—making it the ideal program- simulating different loads and monitoring response times. ming language for building massively concurrent systems These tools—dubbed load testing tools—provide system us- [10]. Applications running inside the Erlang virtual machine ability metrics and enable the repeatability of the trial runs. (called the BEAM) use this concurrency model for distribu- But as they use an external observation point (black-box ap- tion, resilience and scalability [11]. But with the advent of proach), they are not informative on how resources are used new technologies—such as cloud computing, containeriza- inside the SUT. This testing approach can help detecting per- tion and orchestration—developers are not encouraged to be formance degradation, but provides little information over resource savvy in order to satisfy their scalability require- which component of the SUT is causing the degradation. ments. This approach implies that performance issues and Performance monitoring tools, on the other hand, can gather bottlenecks often go undetected during the development pro- detailed metrics about the resources used by the SUT, such as cess, only to be identiﬁed when the system is in production. memory consumption, CPU usage and I/O operations. Unfor- tunately, they do not provide an interface to generate load, as they are meant for the inspection of live production systems B Walter Cazzola and are manually added at a later stage of the development. cazzola@di.unimi.it This lack of support for writing automated and repeatable Francesco Cesarini performance tests means that performance monitoring tools francesco@erlang-solutions.com cannot easily be included as part of the testing pipeline in the Luca Tansini development stages. luca.tansini@studenti.unimi.it This paper proposes PerformERL,a performance test- Department of Computer Science, Università degli Studi di ing framework for Erlang that combines the two approaches. Milano, Milan, Italy The performance testing terminology and the distinction be- Erlang Solutions, London, United Kingdom 123 440 W. Cazzola et al. tween load testing and performance monitoring was ﬁrst outlined by Gheorghiu [14] and later reﬁned by Jiang and Hassan in their survey [21]. PerformERL enables program- mers to write a systematically repeatable suite of tests that stress test the SUT in the early stages of development and keep track of the performance of every component—in terms of resource utilization—as the codebase grows. PerformERL builds on top of the Erlang BEAM, copes with Erlang ecosystem and exploits the BEAM tracing in- frastructure. Its main contribution is to deﬁne an architecture and a methodology to enable the performance testing in the Erlang ecosystem. To the best of our knowledge, Per- formERL is the ﬁrst framework in the Erlang ecosystem that permits to programmatically exercise a SUT and gather detailed metrics about the performance of the SUT, how the resources are used by the SUT components and which component and/or resource usage is responsible of the per- formance degradation of the SUT. Such a contribution is achieved through the design of a speciﬁc architecture (de- tails in Sects. 2 and 3) and in some extensions to the tracing infrastructure in order to improve its applicability and perfor- mance (details in Sects. 3.4.1 and 3.4.2). Also the proposed architecture is general enough to be implemented in different ecosystems as explained in Sect. 3.6. The rest of this paper is organized as follows. Section 2 provides an overview of the main concepts and terminology of PerformERL. Section 3 describes the internal architec- ture of the framework and how it can be realized in the JVM ecosystem. Section 4 shows how PerformERL can be em- ployed and extended with some examples. In Sect. 5,some theoretical measurements and tests to study PerformERL overhead and performance are presented and their results are discussed. Sections 6 and 7 conclude the paper reviewing with related work and presenting our conclusions. 2 Overview PerformERL is a performance testing framework. Accord- ing to Jiang and Hassan [21], it is neither a load testing nor a performance monitoring tool, but a bit of both. It combines the repeatability of load testing with the visibility offered by a performance monitor. PerformERL should be used as any other testing tool: by writing a test suite dedicated, this time, to performance evaluation. The test ﬁles (sometimes Fig. 1 PerformERL test execution ﬂow also referred to as load generator ﬁles) written by the users implement callbacks (deﬁned in the load generator behavior, see Listing 1, details in Sect. 3.1), used by PerformERL 1 The target function patterns—a set of MFAs —identify to (i) exercise a speciﬁc execution of the SUT in which the the group of functions of the SUT that the user is inter- performance measurements will be gathered, (ii) generate the target function patterns, and (iii) set other conﬁguration An MFA is a tuple uniquely identifying an Erlang function through a parameters, such as size, name and duration of the test. module, a name and an arity. 123 PerformERL: a performance testing framework… 441 -module(performerl_load_generator). ested in monitoring for a speciﬁc test case. These will be -type run_info() :: term(). used as a starting point for the performance analysis made -type test_info() :: term(). by PerformERL. By exploiting Erlang tracing infrastruc- -type test_size() :: non_neg_integer(). -type trace_pattern() :: ture , PerformERL gathers data about the target functions { module()|'_', atom()|'_', non_neg_integer()|'_' }. themselves, most notably, the number of times they are called -callback get_test_name() -> string(). and their execution time. PerformERL also discovers any -callback test_setup() -> {ok, test_info()}. process in the SUT that makes use of the target functions and -callback setup_run(Size::test_size()) -> {run_started,[node()]}. gathers metrics on those processes, including memory usage -callback start_load( and reduction count. TestNodes::[node()], Size::test_size()) -> A PerformERL test starts when the user invokes the {load_started, run_info()} | {already_started, pid()}. framework providing a load generator ﬁle. Since the goal of -callback get_load_duration() -> pos_integer(). the performance test is to provide insights into the scalability -callback get_test_sizes() -> {atom(), [test_size()]}. of the SUT, every test is composed of multiple runs. Runs -callback stop_load(RunInfo::run_info()) -> are successive calls to the same load generation functions, {load_stopped, run_info()} | but with different values for the size parameter. The core task {error, not_started}. of each run is to exercise the SUT by generating a computa- -callback teardown_run(RunInfo::run_info()) -> run_ended. -callback test_teardown(TestInfo::test_info()) -> ok. tion load—called workload—for the monitored application -callback get_trace_patterns() -> [trace_pattern()]. proportional to the given size parameter. Finally, when all the test runs have been completed, PerformERL produces its output as a collection of HTML ﬁles with charts and ta- Listing 1: The load generator behavior bles presenting the gathered results. Note that PerformERL does not target any speciﬁc scalability dimension but it aims test ﬁles (comp. ➊)usedby PerformERL must implement to be ﬂexible enough to allow the monitoring of any of them. the performerl_load_generator behavior shown The meaning of the size parameter depends on what the users in Listing 1. would like to measure. For example, size can be the number To have different setup and tear down functions per test of requests if we are interested in how the response time of a and per run enables the user to have more control over the web server scales with the growth of the number of requests generation of the test environment. The test_setup func- or it can be the number of entries in a database when we are tion is only called once at the beginning of the test (step ➁). interested in how the database size scales with the growth of It can be used to start external services that are not directly the number of its entries. Figure 1 summarizes the details of involved in the performance test but are needed during the a test execution ﬂow in the PerformERL framework. load generation steps or to perform operations that only need to be executed once during the test. In the run setup (step ➂) and tear down (step ➈), on the other hand, the user should 3 PerformErl under the hood take care of the actions that have to be done before and after each run, typically starting and stopping the SUT, so that ev- In this section, the different components of the framework ery run will begin with the SUT in the same fresh state. The will be described. Fig. 2 shows the components of Per- return value of the setup_run function must include the formERL and how they interact with the test ﬁle provided by identiﬁers of the nodes in which the SUT is running; these the user. In the following sections, white circled numbers— nodes will be referred to as test nodes (comp. ➌). such as (step ➀)—refer to steps of Fig. 1, whereas black The start_load function (step ➄) contains the code circled numbers—such as (comp. ➊)—refer to components to stress the SUT to obtain its performance data. The of Fig. 2. stop_load function (step ➅) stops any long-running op- eration initiated by its counterpart. The get_test_name 3.1 The load generator behavior and get_trace_patterns functions are not explicitly used in Fig. 1 because they provide conﬁguration parame- The only ﬁle that users have to write in order to imple- ters for a test, but do not affect the execution ﬂow directly. ment a test case is a load generator—i.e., a test ﬁle. The They return the custom test name and the MFA of the target functions respectively. The BEAM provides a powerful set of tools for the introspection of events related to functions, processes and message passing that go by 4 Erlang behaviors—as object oriented interfaces—deﬁne a set of call- the name of Erlang tracing infrastructure. back functions that should be exported by any module implementing The reduction is a counter per process that is normally incremented such behaviors. Failing to implement any of these callbacks generates by one for each function call. a compiler warning. 123 442 W. Cazzola et al. instrumented by injecting the modules needed for the perfor- mance monitoring (step ➃). The injected modules implement the tracer agent (TA), processes discoverer agent (PDA) and metrics collector agent (MCA) (comp. ➍): there will be an instance of each agent on every Erlang node. Once the agents are started, the function load_gen:start_load is called (step ➄). PerformERL will wait for the load gen- eration timeout to expire. The timeout is set by the function load_gen:get_load_duration, and its value must be large enough to enable the SUT to react to the generated load. Finally, data from the PDA and MCA will be gathered (step ➆) and the run will be effectively ended. The only re- maining step before cleaning up and stopping the test nodes is to execute the impact benchmark (step ➇), and to use its results to reﬁne the performance data. 3.3 The tracer agent TA is the ﬁrst agent started on the nodes running the SUT. The ﬁrst purpose of this agent is to use call time tracing to measure the number of calls and the execution time of the target MFA patterns. Call time tracing, enabled by the trace ﬂag call_time, is a feature of the Erlang tracing infrastructure that, for every traced MFA, records on a per process basis how many times the function was called and how much time was spent executing the function. Users can Fig. 2 PerformERL components interaction refer to this data with the function erlang:trace_info. Call time tracing does not require any message passing, as The remaining test components are predeﬁned in Per- it only updates some counters maintained by the BEAM. formERL and do not need to be customized by the user. In The other purpose of TA is to interact with the Erlang tracing infrastructure and to track any process—apart from the Per- Sect. 4.2 we will discuss how PerformERL functionality can be extended. formERL agents—that use the tracing primitives during the tests. By doing this, PerformERL is aware of the context in which the tests are executed and it can work, to a certain 3.2 The performERL module extent, even if the tracing infrastructure is already in use by the SUT. This is required to keep overheads under control, The performerl module (comp. ➋) provides the entry as the BEAM only allows one tracer agent per process. point for every test execution. It contains a main function that In PerformERL, since it is unknown who will call the loads the test ﬁle, sets up the global test environment common monitored functions, every process in the SUT has to be to all runs, and then starts a run for each user-speciﬁed size traced. This could be accomplished by the erlang:trace (step ➉). Once all the runs have been completed, it takes care function, but to tolerate the need of a SUT to use the trac- of tearing down the common environment and generates the ing infrastructure, PerformERL has to employ a more 11 12 output (steps and ). sophisticated approach: the Erlang meta-tracing facility. The execution of a single run can be summarized in the Meta-tracing is applied to an MFA pattern, and it traces following steps, also displayed in Fig. 1, where load_gen the calls made by any process to the functions selected by is the name of the test ﬁle provided by the user. First, the such MFA pattern, without explicitly tracing the caller. To load_gen:setup_run callback is executed (step ➂), be bound to the MFAs enables a ﬁner tracing mechanism which deploys the SUT on a set of Erlang nodes (comp. ➌) that allows more tracer agents per process, making Per- whose identiﬁers are returned. The Erlang nodes are then formERL tolerant to the presence of other tracers. Note that, a SUT using the tracing facility can be observed by Per- A Erlang node, node for short, is an instance of the BEAM. Several formERL thanks to the adoption of the meta-tracer. But, a processes run on each node. Each process can communicate both with SUT that uses the meta-tracing facility can not be observed processes running on the same node and with processes running on other nodes even over the Internet. because one meta-tracer can be associated to one process. 123 PerformERL: a performance testing framework… 443 Fortunately, this limitation has a negligible impact on the diate code interpreted by the BEAM more efﬁciently than applicability of PerformERL because the meta-tracing fa- the corresponding function call. cility is less frequently used than the standard tracing one. PerformERL uses match speciﬁcations to limit the set of The TA is started before the load_gen:start_load processes sending a message to the PDA to those who have function is called, and sets itself as the tracer for all not been discovered yet; this is equivalent to disabling the the other processes in the VM. The MFA patterns to tracing facility for the other processes, reducing overheads. trace are those speciﬁed by the user with the function The list of known active processes is encoded as a balanced load_gen:get_trace_patterns. Then TA sets itself binary search tree sorted on the PIDs, translated into a match as the meta-tracer for the tracing built-in functions in order speciﬁcation with short-circuited boolean operators. The list to detect if the SUT is making use of the tracing infrastructure of active processes is kept updated by removing those that and react accordingly. TA also sets itself as the meta-tracer terminate their execution. The match speciﬁcation is rebuilt for the erlang:load_module function, which is respon- whenever the list is updated. The cost of executing the match sible for loading a module into the BEAM. This permits to speciﬁcation against the PID of a target function caller is log- monitor the calls to the functions described by the MFA pat- arithmic in the number of processes, because of the balancing terns in dynamically loaded modules. These would otherwise of the binary tree structure. be missed because the call time tracing feature is applied to the MFA patterns when TA is started. If this happens be- 3.4.2 The Custom Meta-Tracer fore the workload triggering the dynamic module loading, the dynamically loaded module containing the speciﬁc MFA Meta-tracing is a powerful feature of the BEAM, but it would not be traced. In other words, TA can detect when a is less customizable compared to regular tracing. With the module containing some user-deﬁned MFA patterns is being regular tracing, the user can specify a number of ﬂags to al- dynamically loaded and promptly activate call time tracing ter the format of the generated trace messages. These ﬂags for those. are unavailable when using meta-tracing. In particular, the arity ﬂag—if available—would ease PDA implementa- 3.4 The processes discoverer agent tion because it forces the trace message to contain the arity of the called function rather than the full list of its arguments. PDA tracks those processes that—at any point in their Since sending a message implies the copying of its data, send- lifetime—use the monitored MFA patterns. PDA is started ing trace messages containing only the number of arguments after TA and depends on it for the detection of newly loaded instead of the arguments themselves would signiﬁcantly de- modules. PDA also uses the tracing infrastructure and it crease the overhead of the meta-tracing. is where the most sophisticated tracing techniques are em- Even though meta-tracing cannot be customized, it is pos- ployed to quickly discover the processes with a low overhead. sible to provide a tracer module when setting a meta-tracer. The approach is simple: PDA is notiﬁed about a process The tracing infrastructure allows the user to provide a custom 7 8 presence with a tracing message, stores its PID and starts module , composed of an Erlang stub and a NIF imple- monitoring it as soon as it calls a function matching a user- mentation, to replace part of the back-end of the tracing deﬁned MFA pattern. Then PDA immediately stops tracing infrastructure. It is therefore possible to code a custom tracer the process to reduce the overhead on the SUT—details in that implements the arity ﬂag and further reduces the over- Sect. 3.4.1. Notice that, because of the meta-tracing, the set ´ head. Slaski and Turek [30] demonstrated the efﬁciency and of traced MFA patterns is limited to user-deﬁned ones, but potential of custom tracer modules. the space of traced processes is the whole BEAM runtime. 3.5 The metrics collector agent 3.4.1 Match Speciﬁcations MCA is responsible for polling PDA for active processes The Erlang tracing primitive erlang:trace_pattern and gathering metrics—e.g., memory usage and reductions accepts as its second parameter an argument called match count—about them. The metrics are collected by default ev- speciﬁcation. Match speciﬁcations can be used to control ery 5 seconds, but this interval can be customized. and customize the tracing infrastructure. They are Erlang terms describing a low level program used to match pat- http://erlang.org/doc/man/erl_tracer.html—Erlang tracer behavior. terns, execute logical operations and call a limited set of A NIF (Native Implemented Function) is a function written in C in- commands. Match speciﬁcations are compiled into interme- stead of Erlang. They appear as Erlang functions to the caller, since they can be found in an host Erlang module, but their code is compiled into The built-in functions to access the Erlang tracing infrastructure are: a dynamically loadable shared object that has to be loaded at runtime erlang:trace and erlang:trace_pattern. by the host module. 123 444 W. Cazzola et al. The metrics collected by the MCA are sanitized to re- and thread dispatchers (corresponding to Erlang schedulers). move the tracing overhead from the call time data at the end Moreover, Akka threads are not mapped to JVM threads, of each run. The sanitation consists of removing the (aver- they are lightweight abstractions whose performances can age) overhead introduced by PerformERL tracing from the be monitored by a dedicated framework such as Perform- execution time of the monitored functions. PerformERL ERL. injects the impact benchmark module into the SUT when the The test orchestration functionality of PerformERL can run ends, when both the call time data and the number of dis- easily be reproduced in Akka since it provides all the nec- covered processes are available. This module measures the essary building blocks, such as nodes distribution, message average overhead of tracing—due to both the call time and the passing, remote code injection and remote procedure calls. processes discovery—on the monitored function calls. The The only fundamental component that Akka and the JVM do impact benchmark executes the monitored function 4,000 not offer out-of-the-box is an equivalent of the Erlang trac- times without any tracing enabled. Choosing 4,000 iterations ing infrastructure, which PerformERL’s approach heavily ensures that the process will not exceed its time slot and relies on. PerformERL’s approach requires an interface there will be no overhead due to context switching. The im- to gather metrics for the {actor, method} pair and pact benchmark then spawns a number of processes equal to also to perform dynamic discovery of actors calling spe- the highest number of active processes recorded during the ciﬁc methods, without manually altering the source code of run and activates both the call time tracing and the processes said methods. Ciołczyk et al. [12] describe Akka’s limita- discovery meta-tracing with a match speciﬁcation contain- tions wrt. message exchange tracing. Tracing support comes ing their PIDs. Each spawned process will execute the target from third-party libraries/frameworks. Kamon provides function 4,000 times. The benchmark module concludes by metric gathering functionality for Akka actors, but lacks taking the average execution time over all processes, sub- the possibility to send a message to an equivalent of the tracting the reference measurement to determine the impact PDA. As demonstrated by the Akka tracing tool [12] and of the tracing. Once the impact measurement is completed, AkkaProf [28], the “last missing mile” can be realized via the average latency multiplied by the number of calls is sub- code instrumentation—either aspect-based [23,26] or based tracted from the call time tracing data of each monitored on bytecode instrumentation [8,13]. These techniques permit function. to weave/inject the tracing code to the tested methods (equiv- alent to PerformERL target MFAs), and to gather and pass information about the monitored method calls to the equiva- 3.6 PerformERL in different ecosystems lent of the PDA to perform dynamic actor discovery. PerformERL’s approach is designed for performance test- ing of Erlang’s processes but it can be applied to any ecosys- 4 PerformERL in action tem supporting the actor model [2]. The actor model adopts a ﬁner concurrency unit (the actors) than processes/threads PerformERL is available as a rebar3 plugin and it is also whose implementation usually can not rely on the operating compliant to the escript interface. system mechanisms but requires an ad hoc virtual machine as the BEAM with its scheduling algorithms and commu- 4.1 A usage example: Wombat Plugins nications mechanisms. Erlang natively supports the actor model—Erlang’s processes are de facto actors—but there Wombat [33] is a performance monitoring framework for are several implementations for other ecosystems. The as- the BEAM. It works by injecting a number of agents into the sumption on the concurrency model is not so stringent. It monitored nodes to support its functionality. In this section, just simpliﬁes PerformERL’s transposition. we will show how PerformERL can be used to measure the Akka [17,18] is certainly the most relevant implementa- impact of the Wombat agents and their infrastructure on the tion of the actor model outside the Erlang ecosystem. Akka managed nodes. was born as part of the Scala standard library to then become Listing 2 shows a portion of the PerformERL load gen- the reference framework for all the languages supported by erator ﬁle to exercise the Wombat agents by spawning a the JVM but also Javascript [32]. Akka is heavily inspired by large number of processes in the monitored node. Per- Erlang and implements many of the fundamental actor model formERL monitors the processes and the function calls of primitives offered by the BEAM, such as supervision trees Wombat when it is monitoring another SUT whose num- ber of processes grows accordingly to the speciﬁcation in Erlang run-time system adopts a preemptive scheduler. Each process the load generator ﬁle. As explained in Sect. 3.1, this load receives a time slice measured by a reduction count before being pre- empted, where a reduction is a function call. Since OTP20 the number of allowed reductions is 4,000. https://kamon.io/docs/latest/instrumentation/akka/ 123 PerformERL: a performance testing framework… 445 -module(processes_load_gen). -behaviour(performerl_load_generator). test_setup() -> ok = wombat_lib:ensure_wombat_started(), {ok, []}. get_test_sizes() -> {number_of_processes, [65536,131072,262144,524288,1048576]}. setup_run(Size) -> Node = 'processes@127.0.0.1', StartCmd = "erl -detached -name "++ atom_to_list(Node)++ " -setcookie "++ atom_to_list(erlang:get_cookie())++ "+P"++integer_to_list(Size), [] = os:cmd(StartCmd), % omitted: waiting for the node to come online ok = performerl_lib:inject_mod(?MODULE, Node), {run_started,[Node]}. start_load([Node], Size) -> {node_added, WoNodeId} = wombat_lib:add_node_to_wombat(Node, atom_to_list(erlang:get_cookie())), Pids = rpc:call(Node,?MODULE,spawn_processes,[Size]), {load_started, [{node_info, {Node, WoNodeId}}, {pids, Pids}]}. %********** RPC target function **********% spawn_processes(Size) -> Num = (Size * 95) div 100, Pids = [spawn(fun() -> receive stop -> ok end end) || _ <- lists:seq(1, Num)]. Fig. 3 Results summary extract for the test in Listing 2 Listing 2: PerformERL test ﬁle for testing Wombat plugins shows an excerpt of the summary page for the test load in List- ing 2 . The table shows that PerformERL detects the same number of processes and functions for each run. This is cor- generator ﬁle implements the load_generator behavior rect because Wombat does not change. The graph, instead, to be used with PerformERL.The test_setup function shows that the number of processes that Wombat moni- checks that Wombat is up and running. The setup_run tors affect the call time of Wombat’s functions (information function spawns a node with a system limit for the number of monitored by PerformERL). Then, we can conclude that processes set to the run size parameter. It injects the test mod- the function draining more resources (execution time) when ule itself into the monitored node to enable the methods of the number of processes monitored by Wombat grows the test ﬁle to be called on the test node, as it happens for the is plugin_builtin_metrics:collect_metrics spawn_processes function. The start_load func- whereas all the other Wombat’s functions have an execution tion adds the test node to Wombat and remotely calls—via time consumption independent of the number of processes the Erlang rpm module—its spawn_processes func- or with a negligible growth. Therefore any attempt of opti- tion. It will start a number of idle processes equal to the 95% mizing the execution time of Wombat should affect such a of the processes system limit. The choice of 95% permits function (that is part of the Wombat framework). to have a meaningful load, close to the saturation threshold, and still to be sure not to reach it provoking the test crashing. 4.2 How to extend PerformERL The maximum number of processes can be retrieved by the call erlang:system_info(process_limit).The PerformERL has been designed to be extendable. In this stop_load and teardown_run functions (not shown section, we will show how to create custom agents and collect in Listing 2 ) send a stop message to the spawned processes custom metrics that will cooperate with the default Per- and take care of removing the test node from Wombat and formERL components. shutting it down, respectively. After all test runs have been completed, the user will ﬁnd the output of the framework in a folder named 4.2.1 Collecting Custom Metrics performerl_results with a sub-folder for the test execution containing a front_page.html ﬁle with a The ﬁrst proposed extension consists of adding a new metric summary result data for the test and more detailed ﬁles for to those collected by the MCA. The function each detected function and each discovered process. Figure 3 performerl_mca:collect_metrics combines the 123 446 W. Cazzola et al. -module(performerl_custom_agent). The custom agents modules are provided to Perform- % functions called locally in the PerformErl module ERL as a comma separated list of paths with the com- %****************************************************** mand line argument --custom_agents. PerformERL, -callback get_config(TestNode::node(), LoadModule::module(), through the function compile_custom_agents, parses TestSize::test_size()) -> map(). the comma separated list of module paths and compiles -callback process_results(TestNode::node(), the agents with the binary option. This will not pro- Results::term()) -> ProcessedResults::term(). duce the .beam ﬁle, but a binary which is loaded in % functions called remotely in the test nodes %****************************************************** the local node and injected in the test nodes using the -callback start(Config::map()) -> inject_custom_agents function. The start function {ok, Pid::pid()} | ignore | {error, Error::term()}. -callback get_results_and_stop() -> {ok, Results::term()}. of each custom agent loads the conﬁguration parameters— taken from a combination of the test node name, the test ﬁle and the size of the current run—and remotely starts the agent Listing 3: The Erlang behavior for custom agents in all test nodes with the appropriate conﬁguration. The cus- tom agents are started after the standard ones, allowing users metrics collected across runs: each metric is a tuple to rely on the functionality and services offered by the stan- {metric_name, MetricValue}. To support a new dard PerformERL agents in their custom ones. For the same metric we have to add how it is calculated and its result to reason, they are stopped before the standard agents running the list of the collected metrics. As an example, let us track on the same node. the message queue length of the discovered processes. This As a proof-of-concept, we implement a custom agent is done by adding that checks the health of the SUT during the performance tests by periodically monitoring some invariant properties. QLen = erlang:process_info(Pid, The custom invariant_checker_agent implementa- message_queue_len) tion is shown in Listing 4 . The agent implements both the to the performerl_mca:collect_metrics function performerl_custom_agent and the gen_server QLen contains the result as {message_queue_len, behaviors. The generic server functionality is used to im- Value} and it is returned together with the other collected plement the agents starting and stopping functions, as well metrics. Similarly, theperformerl_html_front_page as the periodic invariant checks. In the get_config func- module—that would present the collected metrics to the tion, the parameters for the agent—i.e., the list of invariants user—will be accommodated to present the new metric as to check and the interval between two consecutive checks— well. are taken from the load generator test ﬁle that implements an To add a new metric to MCA can be realized by a callback additional function (get_invariants)tobeusedwith whereas to accommodate the way it will be displayed has to this custom agent. The process_results function is be done by editing the functions in charge of the visualization where the data produced by the agent during the run will because of the well-known problems about the automatic be analyzed. In the example, the data are processed by the organization of data visualization. In spite of this, it should process_results0 function (not shown for brevity) and be evident that the effort required to add a custom metric is then the results of the invariants checks are printed to the limited. console. Processed data are ﬁnally returned to the caller— the performerl module, that will include them in the 4.2.2 Adding a Custom Agent run results. The test load ﬁle has to deﬁne and specify (via the get_invariants function) the invariants that Per- Adding a custom agent is another extension which can be ap- formERL has to check. Listing 5 shows how a different set plied to PerformERL. The framework provides a speciﬁc of invariants can be speciﬁed for each node involved in a behavior,performerl_custom_agents, allowing cus- test. The ﬁrst function clause deﬁnes an invariant about the tom agents to be started. The agents are Erlang modules maximum size for the tables in the database of the ﬁrst node implementing the four callbacks deﬁned by the behavior and provides a threshold value proportional to the size of the (Listing 3) . current run. The second clause deﬁnes the invariants for the The functions get_config and process_results second node, in which a web server is executed: the ﬁrst one are called locally by the PerformERL module to pass the is to check that the web server is online at all times, and the second one checks the length of the request queue against a parameters of the start function to the agent and process- ing the results after each run ends. The functions start and threshold value. The last invariant is common to all nodes and sets a threshold value of 1GB for the entire node memory. get_results_and_stop are called remotely in the test nodes via RPC from the performerl module and simply At the moment, there is no interface in PerformERL to start and stop the custom agent retrieving the results. automatically generate HTML ﬁles from the data produced 123 PerformERL: a performance testing framework… 447 -module(invariant_checker_agent). get_invariants('db_node@127.0.0.1', RunSize) -> -behaviour(gen_server). [{db_tables_check, -behaviour(performerl_custom_agent). {dm_module, get_tables_size, [max]}, '=<', 32*RunSize}, % performerl_custom_agent callback get_memory_invariant()]; start(Config) -> get_invariants('web_server_node@127.0.0.1', RunSize) -> InitState = #state{ [{web_server_online_check, check_interval = maps:get(check_interval,Config), {web_server, get_info, [status]}, invariants = maps:get(invariants, Config) '==', 'up_and_running'}, }, {web_server_queue_check, gen_server:start({local,?MODULE},?MODULE,InitState,[]). {web_server, get_info, [request_queue_length]}, get_results_and_stop() -> '=<', 100*RunSize}, {ok, ResHist} = get_memory_invariant()]. gen_server:call(?MODULE, get_results, infinity), get_memory_invariant() -> gen_server:stop(?MODULE), {memory_check, {ok, lists:reverse(ResHist)}. {erlang, memory, [total]}, get_config(TestNode, LoadModule, TestSize) -> '=<', 1024*1024*1024}. #{ check_interval => round(LoadModule:get_load_duration()/100), invariants => Listing 5: Examples of get_invariants functions LoadModule:get_invariants(TestNode,TestSize)}. process_results(TestNode, ResHist) -> {TotalChecks,InvResList} = process_results0(ResHist), io:format("Invariants were tested ˜p times on " 5 Evaluation "test node:˜p˜n",[TotalChecks, TestNode]), lists:foreach( fun({InvName,V}) -> In this section, PerformERL overhead and performance are io:format(" -invariant ˜p was violated " analyzed. All the presented tests were carried out on a 64-bit "˜p times˜n",[InvName, V]) end, InvResList). laptop equipped with an 8-core Intel i7@2.50GHz CPU and % gen_server callbacks 8GB of RAM running Erlang/OTP version 20.3 on Linux. init(InitState=#state{check_interval = Interval}) -> erlang:send_after(Interval,self(),check_invariants), Any considered SUT has to run on the BEAM and can be {ok, InitState}. composed of multiple types of nodes distributed across mul- handle_call(get_results,_From,State) -> tiple machines. {reply, {ok, State#state.history}, State}. handle_info(check_invariants,State#state{history=Hist) -> Res = check_invariants(State#state.invariants), 5.1 Memory footprint of the agents erlang:send_after(State#state.check_interval, self(), check_invariants), {noreply, State#state{history=[Res|Hist]}}; Running out of memory is one of the few ways in which %omitted: non relevant gen_server callbacks an instance of the BEAM can be crashed. It is fundamental % internal functions that the memory footprint of injected agents is minimal. In check_invariants(InvList) -> lists:map( our evaluation, we calculate the upper bound of the Per- fun({Name, MFA, Op, DefValue}) -> formERL memory consumption. All reported calculations {Name, check_invariant(MFA, Op, DefValue)} end, InvList). are based on the ofﬁcial Erlang efﬁciency guide , where check_invariant({M,F,Args}, Op, DefValue) -> the reference architecture is 64-bit and every memory word Value = erlang:apply(M,F,Args), requires 8 bytes. case test(Value, Op, DefValue) of true -> ok; The TA does not keep any relevant data structures other false -> {violated, Value} than a list of processes using the Erlang tracing infrastructure end. alongside PerformERL. Every entry of this list contains a test(A, '==', B) -> A == B; test(A, '=<', B) -> A =< B; tuple of the form test(A, '>=', B) -> A >= B. Listing 4: Custom invariant_checker_agent TA consumes 12 memory words for each traced MFA by the custom agents, so the users will need to manually pattern plus 1 word for the pointer to the list and 1 word modify the performerl_html_front_page in order for each entry in the list. In total, TA consumes 13n + 1 to present it—similarly to what has been done for displaying words of memory where n is the number of different traced the custom metrics in Sect. 4.2.1. Future developments of MFA patterns. We can conclude that TA will never consume PerformERL will include an overhaul of the code generat- excessive amounts of memory, as it would require the SUT ing the output HTML ﬁles to adopt a more modular approach and to offer a behavior with set of callbacks to support seam- less extensions to the data visualization. http://erlang.org/doc/efﬁciency_guide/advanced.html. 123 448 W. Cazzola et al. to trace about 10,000 different MFA patterns to consume a single MB of memory. The PDA keeps two data structures: a map mapping PIDs to the names of the discovered processes and a gb_sets containing the PIDs of the active processes. The number of discovered processes is the upper bound for the active pro- cesses during the execution of a PerformERL test. The map consumes 11 words per entry in the worst case. Its in- Fig. 4 Average overhead on a function call introduced by the trac- ternal structure is implemented as a hash array mapped trie ing mechanisms. Four conﬁgurations for the tracing mechanism are considered: (1) meta-tracing only enabled (2) call time tracing, (3) (HAMT) [6] when larger than 32 entries. According to the meta-tracing with tree match speciﬁcation and (4) meta-tracing with efﬁciency guide, a HAMT consumes n · f memory words tree match speciﬁcation and calling processes discovering. The x-axis plus the cost of all entries where n is the number of entries (in logarithmic scale) reports number of processes spawned and the and f is a sparsity factor that can vary between 1.6 and 1.8 y-axis the average overhead over 4,000 calls to a dummy function due to the probabilistic nature of the HAMT. Considering the worst case scenario, we have a total memory consumption of 1.8n + 9n = 10.8n words where n is the number of entries. The gb_sets entries are the PIDs of the active processes which only take 1 memory word each. The data structure is based on general balanced trees [3] and is represented in Erlang by a tuple with two elements: the number of en- tries and a tree structure with nodes of the form {Value, LeftChild, RightChild} where the empty child is Such a list roughly consumes: 15n+4 words where n is the represented by the nil. The entire structure consumes 2 number of active processes. To collect more metrics would words for the outer tuple, 1 word for the number of entries, only add the cost for their values representation. Assuming 3 words for the internal nodes with two children, 4 for those a standard 5 seconds interval between metrics collections, with only one child, and 5 words for the leaves. Since the which gives 12 collections per minute, and 10,000 active gb_sets is a complete binary tree, if n is the number of processes, this structure will grow by approximately 1.8MB entries there will be: every minute. The memory consumption of all the agents put together is – (n + 1)/2 leaves (5 words each) roughly: – n/2 internal nodes (3 words each) 13p + 14.8d + (4 + 15d)c ∼ O(p + dc) Note that at most one internal node has only one child (when memory words, where n is even) by deﬁnition of complete binary tree. Summing it all up, the cost for the gb_sets of n active processes is: – p is the number of MFA patterns traced by the SUT 3 + 5 ∗(n + 1)/2+ 3 ∗n/2+ n mod 2 – d is the number of processes discovered (also used as an upper bound for the active processes) which is roughly 4n memory words. – c is how many times the metrics are collected in a run The MCA holds in memory a list with the history of the (which depends on the duration and the metrics collection collected metrics, so it naturally grows with time. Assuming, frequency). without loss of generality, that only the default memory and reduction metrics are collected, every entry of the list will be It can be seen that the memory consumed by PerformERL of the form agents is linear (O(p + dc)) and depends on dimensions that the users can predict and control. 123 PerformERL: a performance testing framework… 449 5.2 Overhead on monitored functions In this section, we will explore the overhead Perform- ERL’s agents add to the monitored functions. As explained in Sect. 3, the tracing of the MFA is the main culprit, if not unique, of the overhead introduced by PerformERL. Therefore, our experiment will focus on the average over- head PerformERL adds on the execution time of a dummy function called a ﬁxed number of times (1,024 in our ex- periment) in an environment with an increasing number of processes (the size parameter) and with different tracing con- ﬁgurations activated. The idle processes—i.e., those that are not calling the dummy function—do not impact the call time directly but the resources used by the tracing infrastructure. Fig. 5 Comparing the average overhead due to the standard meta- It could seem counter-intuitive that we keep the number of tracing and with the arity extension. The x-axis (in logarithmic scale) reports the number of integers passed to the calls. The y-axis reports the calls ﬁxed when the total number of processes grows but the average overhead (in microseconds) over 100,000 calls to a dummy this would permit to monitor how the tracing facility impacts function the call time. The considered tracing conﬁgurations are: tracing mechanism enabled. Figure 4 shows the results of the 1. meta-tracing enabled without any match speciﬁcations experiment using a logarithmic scale on the x-axis. It demon- which always causes the caller to send a message. Per- strates that the increasing number of processes only affects formERL does not use this conﬁguration, but it provides the tracing techniques (3) and (4). It can also be seen that a reference to compare with the other techniques; the growth is logarithmic, which conﬁrms the theory behind 2. call time tracing, which does not require any message the tree match speciﬁcation presented in Sect. 3.4.1. The call exchanging but only updates some counters inside the time tracing conﬁguration (2) also shows a slight overhead BEAM. This is used by the TA. increase for larger numbers of processes. This is likely due 3. meta-tracing with the tree match speciﬁcation described to the performance of the data structures internally used by in Sect. 3.4.1 and the calling processes identiﬁers al- the BEAM to store the counters. The results show that the ready present in the tree. This case represents the already techniques employed for the process discovery cause an over- discovered processes calling a function and requires no head that is between 1.5 and 2 times higher than a plain usage message exchanging; of meta-tracing but they allow PerformERL to prevent al- 4. meta-tracing with the tree match speciﬁcation and calling ready discovered processes from sending trace messages and processes identiﬁers not present in the tree, so a trace mes- avoid ﬂooding the PDA. The higher overhead is due to the sage will be sent. This case represents the processes not execution time of the match speciﬁcation and in the last con- yet discovered calling a monitored function. Messages ﬁguration (4) also to the custom meta-tracer module being are sent via the custom meta-tracer module described in activated to send a custom message. Sect. 3.4.2. A second experiment has been done to show the im- portance of the custom meta-tracer module introduced in For each tracing conﬁguration and for each value of size, Sect. 3.4.2. This experiment compares the average overhead size processes are spawned enabling the tracing facility for imposed by meta-tracing using the standard back-end (that a dummy function: the processes are assigned an ID from 0 sends a trace message containing the full list of arguments) to size-1. The spawned processes whose ID is a multiple of with meta-tracing using PerformERL custom meta-tracer size/1,024 (i.e., ID = 0 mod (size/1, 024)) are selected to call 12 module implementing the arity ﬂag. It measures the av- the dummy function 4,000 times, measuring the execution erage execution time of a traced dummy function called time with the timer:tc function. The average execution 100,000 times for each conﬁguration. Conﬁgurations dif- time for a single call to the dummy function is computed fer for a parameter called argument size that determines the when all the selected processes have executed the bench- length of the list of integers passed to the dummy function. mark. The overhead is determined by subtracting a reference Since sending a message requires to copy the data to be sent, value obtained executing the same benchmark without any passing large parameters to a monitored function causes an increase in the tracing overhead. The choice of 4,000 grants that the number of reductions will not trig- Figure 5 presents the results of the tests. For small ar- ger the scheduling algorithm avoiding the overhead due to the context switching. guments, the custom meta-tracer causes a slightly higher 123 450 W. Cazzola et al. overhead compared to the standard back-end because it needs to access a dynamically loaded shared library in addition to the BEAM tracing infrastructure. It can be seen that the over- head starts to diverge for arguments larger than a list of 64 integers: up to 100 times for a list of 16,384 integers which is not an unlikely size of arguments for an Erlang function call. In fact, the custom meta-tracer module acts as failsafe for the standard back-end when a process calls a monitored function with a very large argument. In this scenario, two undesirable things can occur: the process slows down due to the copying of the arguments and the PDA runs out of memory if too many of these messages are sent to it. Fig. 6 Average overhead on cowboy response time 5.3 PerformERL in the real world To show that the overhead introduced by PerformERL monitoring and tracing facility to the running SUT is av- were averaged to minimize any spike due to external factors erage if not less than the one of the other frameworks, we beyond our control. measure it on a real case: cowboy , a well-known Erlang Figure 6 shows the results of the experiment, in terms of HTTP server, and compare it with the overhead of similar average response time for each workload. From the diagram, Erlang tools. In addition to PerformERL, the other chosen it can be noticed that all the considered tools caused a no- tools were Wombat [33], a proprietary performance mon- ticeable overhead when the number of requests is low. This 14 15 itoring framework, eprof and fprof, two proﬁling is likely due to the tools performing some initial operations, tools distributed with the Erlang standard library. Unfor- such as setting up their monitoring facility, that affects the tunately, to the best of our knowledge, there are no other ﬁrst few requests received by the server. In particular, Per- performance testing frameworks for the Erlang ecosystem formERL imposed a higher slowdown factor of 6 that settles and we have to compare PerformERL with performance on 2 with the growing of the workload. The initial peak can monitoring frameworks. To maintain the comparison fair, be attributed to PerformERL PDA which must discover all we are measuring a resource (the average response time) ob- the cowboy processes, populate its data structures, and up- servable without accessing to the SUT data structures: access date the match speciﬁcations before the ﬁrst request could that the performance monitoring framework do not have. The be served. The cusp corresponds to when the number of re- experiment will measure the server average response time to quests is such that their satisfaction allows to mitigate the a number of HTTP GET requests both when the monitoring initial overhead. At that point the slowdown can be attributed facility is active and when it is not. to the heavy usage of tracing done by PerformERL, as de- The conﬁguration of PerformERL used in this experi- scribed in Sect. 5.2. eprof shows a slowdown of around ment had the target MFA patterns matching all the functions 1.4 for every workload. This tool is only employing call inside the cowboy codebase. Wombat was used with a stan- time tracing which, as shown in the previous section, causes dard conﬁguration. eprof and fprof were set up to trace a smaller overhead on the monitored functions, hence the every process in the cowboy server node. For each tool, the slowdown factor is lower compared to PerformERL as ex- experiment was set up with ﬁve increasing amounts of HTTP pected.Wombat, similarly toPerformERL, causes a higher requests to measure the impact of the tools under different overhead in the monitored node in the ﬁrst few seconds after workloads. The requests are synchronous: a new request is its deployment due to the setting up of its plugin infrastruc- made when the results of the previous one are received. In ture. After that it can be seen that over time Wombat does not this way, each request is satisﬁed when cowboy receives it impose any overhead at all. fprof is the tool that showed and no time is spent in a waiting queue that would bias the the highest overhead in the experiment, with a constant av- ﬁnal measurements. For each of the described settings the erage slowdown factor of 5 across all workloads. This is due experiment was run 100 times and the results of each set up to the heavy use of the tracing infrastructure done by fprof which traces every function call made by the monitored pro- cesses and writes the trace message in a ﬁle that will later be Cowboy—Small, fast, modern HTTP server for Erlang/OTP: https:// github.com/ninenines/cowboy. analyzed to produce a detailed call graph reporting for each https://erlang.org/doc/man/eprof.html. function how much time was spent executing code local to https://erlang.org/doc/man/fprof.html. the function and in functions called by that one. 123 PerformERL: a performance testing framework… 451 The experiment shows that the overhead caused in the web the SUT (Sect. 3.3) as demonstrated by the experiment on server by the monitoring tools is proportional to the usage of Wombat reported in Sect. 4.1. Said that, PerformERL still the tracing infrastructure, after an initial startup time where has some limitations: for one, the SUT should not make use some tools, namely PerformERL and Wombat,haveto of meta-tracing. This is not an issue, as with existing Erlang setup their infrastructure which competes with the web server applications, it seems that the meta-tracing facility is under- for the scheduling, causing an increase in the slowdown. The rated and only used for troubleshooting live systems. usage of the tracing infrastructure depends on the features Another problem is the unloading (or reloading) of mod- that the tool offers regarding function calls. fprof provides ules containing target function patterns during the tests. If this more detailed information about function calls compared to happens, the call time proﬁling data will be lost. In future ver- the other tools and, for that reason, is the one with the highest sions, a periodic back-up of this data could be implemented overhead. PerformERL places in the experiment between at the cost of increased memory consumption, or a tracing- eprof and fprof and in fact, it provides the same infor- based mechanism monitoring the unloading and reloading of mation as the former regarding function calls, but it also uses modules could be used to detect the issue. tracing for the real-time discovery of processes, which is a feature that no other tool offers. Wombat is different from the other tools since it is meant for monitoring of live pro- 6 Related work duction systems and focuses more on system wide metrics rather than function calls, so it can afford to limit the usage of The idea and the need of promoting performance testing in the tracing infrastructure resulting in an overhead of almost the early stages of the development cycle, which is one of the zero, at least in a standard conﬁguration. guiding principles behind this work, has been pointed out by Woodside et al. [35]. Others, such as Johnson et al. [22], 5.4 Discussion suggested that performance testing should be incorporated in test driven development and that is indeed a goal that can PerformERL should be included in the testing pipeline of be achieved using PerformERL. a project and is not meant to be used in a production envi- In this section, we describe a few tools that are commonly ronment. This means that the primary goal of the framework used and share similar goals with PerformERL. The focus is to provide a thorough insight into the SUT whilst offering is on tools popular in the Erlang ecosystem but we will also compatibility with as many applications as possible, rather discuss the most akin approaches even if unrelated to the than achieving a low overhead. Nevertheless, the tests and BEAM. estimates presented in this section show that the users can predict the dimension of the overhead caused by Perform- 6.1 Performance monitoring tools ERL, both in terms of memory consumption and execution time overhead. Both these dimensions depend on the num- In this paragraph we present tools related to PerformERL ber of processes that the SUT spawns and how many of them that fall in the category of performance monitoring tools in PerformERL has to discover. accordance to Jiang and Hassan [21] terminology. In general, PDA and MCA provide useful information A standard Erlang release ships with tools like eprof and when there is a limited set of long-lived processes. On the fprof, that are built on top of the tracing infrastructure and other hand, trying to get information over a large number of provide information about function calls. A set of processes worker processes that execute for a very short time before and modules to trace can be speciﬁed to limit the overhead, terminating will not provide any useful insight other than however, the approach of these tools is basic and has been the number of such processes and the list of monitored func- improved in our framework to both reduce the impact on tions that they called, whilst degrading the performance of the the SUT and gather more meaningful results. Furthermore, agents. This is a limitation inherent to the design of the Per- their output is text based, which may result in a poor user formERL framework and the Erlang tracing infrastructure experience. More evolved tools, including PerformERL, itself, also discussed by Slaski and Turek [30]. To mitigate process the output to generate reports with plots and charts this issue, we are developing an extension to Perform- to better help the user understand the gathered data. ERL that will enable the possibility of disabling some of the Another tool already distributed with Erlang is the agents, at the cost of losing some of the standard features. Observer . Observer is an application that needs to The real challenge PerformERL had to face is to apply be plugged into one or more running Erlang nodes offering a performance testing on SUTs as Wombat [33] that actively graphical user interface to display information about the sys- need tracing to run without hindering their operation. In this respect, PerformERL uses and extends the meta-tracing Observer, a GUI tool for observing an Erlang system: facility to tolerate the use of the tracing infrastructure by http://erlang.org/doc/man/observer.html. 123 452 W. Cazzola et al. tem such as application supervision trees, processes memory Basho Bench is a benchmarking tool created to con- allocations and reductions, and ETS tables. While some of duct accurate and repeatable performance and stress tests the metrics gathered by this tool are similar to what Per- inside the Erlang environment. It was originally implemented formERL offers, the approach is different, as Observer is to test Riak [24] but can be extended by writing cus- meant for live monitoring of entire nodes activity, whereas tom probes in the form of Erlang modules. The approach PerformERL is used to write repeatable tests and can focus is indeed similar to the one used in PerformERL,but it on speciﬁc components of the SUT. focuses on two measures of performance—throughput and XProf [15] is a visual tracer and proﬁler focused on func- latency—related to network protocols and DB communica- tion calls and production safety. It achieves a low overhead tions. Basho Bench differs from PerformERL in the by only allowing the user to measure one function at a time sense that the former gives an overview of what the per- and gives detailed real-time information about the monitored formance of an entire system looks like from the outside, function execution time, arguments and return value. Its pur- while the latter provides insights into the performance of pose is mainly to be used to debug live production systems. the system’s components. Moreover, Basho Bench does Wombat [33] is a monitoring, operations and performance not support the concept of run that permits to execute the framework for the BEAM. It is supposed to be plugged into a same test with different loads. This is a crucial feature for a production system all the time and its features include gather- performance testing framework as PerformERL that must ing application- and VM-speciﬁc metrics and showing them monitor how the SUT behaves as the load increases. Similar in the GUI as well as sending threshold based alarms to the considerations can be done for BenchERL [4]aswell. system maintainer so that issues and potential crashes can Akka tracing tool [12] is a library to be used with Akka be prevented. The aim of Wombat is different from that of applications that permits to generate a trace graph of mes- PerformERL, as it is not a testing tool, even if both share the sages. It focuses on collecting metrics related to the messages idea of injecting agents into the monitored system to gather exchange. It is extendable and provides an interfaces to show metrics. the collected data. It shares a philosophy and an architecture Keiker [34]isa Wombat counterpart outside the BEAM similar to PerformERL without providing its insights on written in Java. It replaces the Erlang tracing infrastructure by the used resources/data structures. However, this is an ex- using aspect-oriented programming [23] to instrument code, tension whose support is envisionable since they already use but the users have to write the aspects, which requires to AspectJ to inject the code to trace the messages (as we sug- know AspectJ and an additional coding effort. gest in Sect. 3.6 for the PerformERL implementation on the JVM). 6.2 Load testing tools 6.3 Performance testing tools In this section we present the related tools that—because of In this section we will present the tools related to Per- their black-box approach to performance testing—we cate- formERL whose white-box approach we consider to be gorize under the name of load testing tools, in accordance to performance testing. Jiang and Hassan [21] terminology. erlperf is a collection of tools useful for Erlang Apache JMeter [16] and Tsung are widely used load proﬁling, tracing and memory analysis. It is mainly a per- testing tools. The former is written in Java and the latter formance monitoring tool but it offers a feature called is its Erlang counterpart. They share with our framework continuous benchmarking meant for scalability and perfor- the repeatability of the tests and the idea of running them mance inspection that allows the user to repeatedly run with increasing amounts of load but similarities stop there. tests and record benchmark code into test suites. This fea- Test conﬁgurations are speciﬁed via JSON-like ﬁles instead ture together with the collected proﬁling data suggest that of code and their goal is to measure the performance of erlperf could serve a purpose similar to PerformERL. web applications—or various network protocols in general— However, the characteristics that would make erlperf a under a large number of requests from an external point of performance testing tool are still in a rudimentary state and view by looking at response times. PerformERL,onthe no documentation is available to clearly understand their pur- other hand, provides information from the inside of the sys- pose and functionality. tem, showing how each component reacts to the load. detectEr tool suite [1,5] has some commonalities with PerformERL. They both target Erlang infrastructure, ETS tables are an efﬁcient in-memory database included with the Erlang virtual machine. Basho benchhttps://github.com/basho/basho_bench. 18 20 Tsung, a distributed load testing tool: http://tsung.erlang-projects. erlperf, a collection of tools useful for Erlang proﬁling, tracing org. and memory analysis: https://github.com/max-au/erlperf. 123 PerformERL: a performance testing framework… 453 they both rely on the SUT execution for their analysis and and sending them back to the AkkaProf logic agent (de both consider benchmarking and experiment reproducibil- facto implementing a sort of tracing facility). ity. Even if detectEr targets a post-deployment phase and runtime property validation. As PerformERL, detectEr relies on Erlang’s actor model and the authors [1] discussed 7 Conclusion and future developments how the approach can be realized in other languages with dif- ferent implementations of the actor model—with highlights This paper introduces PerformERL: a performance test- similar to those described in Sect. 3.6. Due to its nature, ing framework for the Erlang ecosystem. PerformERL can detectEr has a limited view on the runtime usage of the be used to monitor the performance of a SUT during its resources. To some extends, the two approaches complement execution or be included in its testing pipeline thanks toPer- each other. formERL interface for deﬁning load tests programmatically. Stefan et al. [31] conducted a survey on unit testing per- PerformERL can collect several kind of metrics both about formance in Java projects. From the survey, many tools SUT internals and its behavior and it can also be extended 21 22 emerged—such as JUnitPerf , JMH and JPL [9]— with new metrics. Throughout this paper we have investi- that through various techniques apply microbenchmarking gated PerformERL usability and visibility over the SUT, to portions of a Java application in the form of unit tests. highlighted its ﬂexibility demonstrating how it can be ex- This tools share with PerformERL the repeatability and tended to match the user needs and the overhead it imposes a systematic approach aimed at testing performance, so we over the SUT, showing both its strengths and weaknesses. consider them performance testing frameworks. However, One of PerformERL weak points is the module used to they are aimed at testing speciﬁc units of a Java system and visualize the results. Although it automatically shows the col- mostly focus on execution time only. lected data, it is quite rigid wrt. the possible customizations A different approach to performance testing in Java was of PerformERL forcing its manual extension to accom- proposed by Bhattacharyya and Amza [7]. They proposed a modate the visualization of new metrics. In future work, a tool, PReT, that tries to automatically detect any Java pro- more sophisticated approach could be adopted for the pre- cess in a system that is running a regression test and starts sentation of the test results that will ease the integration of to collect metrics on them. The tool employs machine learn- data produced by both custom agents and custom metrics. ing both to identify the processes running a speciﬁc test and Moreover, to increase the level of automation, future devel- to detect any anomalies in the collected measurements that opments could include an interface to provide performance could indicate a performance regression. The approach can requirements—in the form of threshold values for the col- deﬁnitely be considered performance testing but it differs lected metrics—in order to deﬁne a pass/fail criteria [19]. from PerformERL in the sense that they evaluate perfor- Alternative criteria could be the no-worse-than-before prin- mance measurements on tests already in place rather than ciple deﬁned by Huebner et al. [20] or the application of providing an interface to generate a workload. machine learning techniques as proposed by Malik et al. [25]. Moamen et al. [27] explored how to implement resource We are also considering to investigate how PerformERL control in Akka-based actor systems. Their proposals share could be integrated in the detectEr [5] tool chain. the general philosophy of PerformERL but are based on the manipulation of the basic mechanisms of the actor model: Acknowledgments This work was partly supported by the MUR project “T-LADIES” (PRIN 2020TL3X8X). The authors wish also to the spawning of the actors and the dispatch of the messages. thank the anonymous reviewers for their comments: they helped a lot The former permits to know the existence of an actor and in improving the quality of this work. then monitoring it since its spawning without the need of a PDA. The latter obviates to the need for a tracing facility. Funding Open access funding provided by Università degli Studi di Milano within the CRUI-CARE Agreement. These approaches are more invasive and cannot be used to do performance testing of systems that cannot be stopped. Open Access This article is licensed under a Creative Commons Attri- AkkaProf [28,29] provides an approach similar to the one bution 4.0 International License, which permits use, sharing, adaptation, proposed by Moamen et al. [27] but instead of instrumenting distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, pro- the way an actor is spawned AkkaProf dynamically instru- vide a link to the Creative Commons licence, and indicate if changes ments the actors when their classes are loaded in the JVM. were made. The images or other third party material in this arti- The injected code takes also care of collecting the metrics cle are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your in- tended use is not permitted by statutory regulation or exceeds the https://github.com/clarkware/junitperf. permitted use, you will need to obtain permission directly from the copy- Oracle Corporation, Java Microbenchmarking Harness: http:// right holder. To view a copy of this licence, visit http://creativecomm openjdk.java.net/projects/code-tools/jmh/. ons.org/licenses/by/4.0/. 123 454 W. Cazzola et al. References 21. Jiang, Z.M., Hassan, A.E.: A Survey on Load Testing of Large- Scale Software Systems. IEEE Trans. Softw. Eng. 41(11), 1091– 1118 (2015) 1. Aceto, L., Attard, D. P., Francalanza, A., Ingólfsdóttir, A.: On 22. Johnson, M.J., Ho, C.-W., Maximilien, E.M., Williams, L.: In- Benchmarking for Concurrent Runtime Veriﬁcation. In FASE’21, corporate Performance Testing in Test-Driven Development. IEEE LNCS 12649, pp. 3–23, Luxembourg City, Luxembourg, (2021). Software 24(3), 67–73 (2007) Springer 23. Kiczales, G., Hilsdale, E., Hugunin, J., Kersten, M., Palm, J., Gris- 2. Agha, G.: Actors: A Model of Concurrent Computation in Dis- wold, B.: An Overview of AspectJ. In ECOOP’01, LNCS 2072, tributed Systems. MIT Press, Cambridge (1986) pp. 327–353, Budapest, Hungary, (2001). Springer-Verlag 3. Andersson, A.: General Balanced Trees. J Algorithms 30(1), 1–18 24. Klophaus, R.: Riak Core: Building Distributed Applications with- (1999) out Shared State. In CUFP’10, pp. 14:1–14:1, Baltimore, Maryland, 4. Aronis, S., Papaspyrou, N., Roukounaki, K., Sagonas, K., Tsiouris, USA, (2010). ACM Y., Venetis, I.E.: A Scalability Benchmark Suite for Erlang/OTP. 25. Malik, H., Hemmati, H., Hassan, A.E.: Automatic Detection of Per- In Erlang’12, pp. 33–42, Copenhagen, Denmark, (2012). ACM formance Deviations in the Load Testing of Large Scale Systems. In 5. Attard, D.P., Aceto, L., Achilleos, A., Francalanza, A., Ingólfs- ICSE’13, pp. 1012–1021, San Francisco, CA, USA, (2013). IEEE dóttir, A., Lehtinen, K.: Better Late Than Never or: Verifying 26. Marek, L., Villazón, A., Zheng, Y., Ansaloni, D., Binder, W., Qi, Z.: Asynchronous Components at Runtime. In FORTE’21, LNCS DiSL: A Domain-speciﬁc Language for Bytecode Instrumentation. 12719, pp. 207–225, Valletta, Malta, (2021). Springer In AOSD’12, pages 239–250, Potsdam Germany, (2012). ACM 6. Bagwell, P.: Ideal Hash Trees. Technical report, École Polytech- 27. Moamen, A.A., Wang, D., Jamali, N.: Approaching Actor-Level nique Fédérale de Lausanne, Lausanne, Switzerland (2001) Resource Control for Akka. In JSSPP’18, LNCS 11332, pp. 127– 7. Bhattacharyya, A., Amza, C.: PReT: A Tool for Automatic Phase- 146, Vancouver, BC, Canada, (2018). Springer Based Regression Testing. In CloudCom’18, pp. 284–289, Nicosia, 28. Rosà, A., Chen, L.Y., Binder, W.: AkkaProf: A Proﬁler for Akka Cyprus, (2018). IEEE Actors in Parallel and Distributed Applications. In APLAS’16, 8. Bruneton, E., Lenglet, R., Coupaye, T.: ASM: A Code Manipu- LNCS 10017, pp. 139–147, Hanoi, Vietnam, (2016). Springer lation Tool to Implement Adaptable Systems. In: Adaptable and 29. Rosà, A., Chen, L.Y., Binder, W.: Proﬁling Actor Utilization and Extensible Component Systems, (2002) Communication in Akka. In Erlang’16, pp. 24–32, Nara, Japan, 9. Bulej, L., Bureš, T., Horký, V., Kotrc, ˇ J., Marek, L., Trojánek, T., (2016). ACM T˚uma, P.: Unit Testing Performance with Stochastic Performance 30. Slaski, M., Turek, W.: Towards Online Proﬁling of Erlang Systems. Logic. Automated Softw. Eng. 24, 139–187 (2017) In ERLANG’19, pages 13–17, Berlin, Germany, (2019). ACM 10. Cesarini, F., Thompson, S.J.: Erlang Programming: A Concurrent 31. Stefan, P., Horký, V., Bulej, L., Tuma, ˚ P.: Unit Testing Performance Approach to Software Development. O’Reilly, (2009) in Java Projects: Are We There Yet? In ICPE’17, pp. 401–412, 11. Cesarini, F., Vinoski, S.: Designing for Scalability with L’Aquila, Italy, (2017). ACM Erlang/OTP: Implementing Robust, Fault-Tolerant Systems. 32. Stivan, G., Peruffo, A., Haller, P.: Akka.js: Towards a Portable Ac- O’Really Media, (2016) tor Runtime Environment. In AGERE!’15, pp. 57–64, Pittsburgh, 12. Ciołczyk, M., Wojakowski, M., Malawski, M.: Tracing of Large- PA, USA, (2015). ACM Scale Actor Systems. Concurrency and Computation-Practice and 33. Trinder, P., Chechina, N., Papaspyrous, N., Sagonas, K., Thomp- Experience 30(22), e4637 (2018) son, S.J., Adams, S., Aronis, S., Baker, R., Bihari, E., Boudeville, 13. Dahm, M.: Byte Code Engineering. In Java-Informations-Tage, O., Cesarini, F., Di Stefano, M., Eriksson, S., Fördos, ˝ V., Ghaffari, 267–277, (1999) A., Giantsios, A., Green, R., Hoch, C., Klaftenegger, D., Li, H., 14. Gheorghiu, G.: Performance vs. Load vs. Stress Testing Lundin, K., MacKenzie, K., Roukounaki, K., Tsiouris, Y., Win- [Online]. http://agiletesting.blogspot.com/2005/02/performance- blad, K.: Scaling Reliably: Improving the Scalability of the Erlang vs-load-vs-stress-testing.html, (2005) Distributed Actor Platform. ACM Trans. Prog. Lang. Syst. 39(4), 15. Gömöri, P.: Proﬁling and Tracing for All with Xprof. In: Proceed- 17:1-17:46 (2017) ings of the Elixir Workshop London, London, United Kingdom, 34. van Hoorn, A., Waller, J., Hasselbring, W.: Kieker: A Framework (2017) for Application Performance Monitoring and Dynamic Software 16. Halili, E.H.: Apache JMeter: A Practical Beginner’s Guide to Au- Analysis. In ICPE’12, pp. 247–248, Boston, MA, USA, (2012). tomated Testing and Performance Measurement for Your Websites. ACM Packt Publishing, (2008) 35. Woodside, M., Franks, G., Petriu, D.C.: The Future of Software 17. Haller, P.: On the Integration of the Actor Model in Mainstream Performance Engineering. In FOSE’07, pp. 171–187, Minneapolis, Technologies: The Scala Perspective. In AGERE!’12’, pp. 1–6. MN, USA, (2007). IEEE ACM, (2012) 18. Haller, P., Odersky, M.: Scala Actors: Unifying Thread-Based and Event-Based Programming. Theoret. Comput. Sci. 410(2–3), 202– 220 (2009) Publisher’s Note Springer Nature remains neutral with regard to juris- 19. Ho, C.-W., Williams, L.A., Antón, A.I.: Improving Performance dictional claims in published maps and institutional afﬁliations. Requirements Speciﬁcations from Field Failure Reports. In RE’07, pp. 79–88, New Delhi, (2007). IEEE 20. Huebner, F., Meier-Hellstern, K., Reeser, P.: Performance Testing for IP Services and Systems. In GWPSED’00, LNCS 2047, pp. 283–299, Darmstadt, Germany, (2000). Springer http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Distributed Computing Springer Journals http://www.deepdyve.com/lp/springer-journals/performerl-a-performance-testing-framework-for-erlang-RXmC9ewaC3

Loading next page...

References (39)

Andrea Rosà, L. Chen, Walter Binder (2016)
AkkaProf: A Profiler for Akka Actors in Parallel and Distributed Applications
Z. Jiang, A. Hassan (2015)
A Survey on Load Testing of Large-Scale Software Systems
IEEE Transactions on Software Engineering, 41
Arnamoy Bhattacharyya, C. Amza (2018)
PReT: A Tool for Automatic Phase-Based Regression Testing
2018 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)
(2007)
The Future of Software PerformanceEngineering
P. Trinder, Natalia Chechina, N. Papaspyrou, Konstantinos Sagonas, S. Thompson, Stephen Adams, Stavros Aronis, Robert Baker, Eva Bihari, Olivier Boudeville, Francesco Cesarini, M. Stefano, Sverker Eriksson, Viktória Fördős, A. Ghaffari, Aggelos Giantsios, Rickard Green, Csaba Hoch, David Klaftenegger, Huiqing Li, Kenneth Lundin, K. Mackenzie, Katerina Roukounaki, Yiannis Tsiouris, Kjell Winblad (2017)
Scaling Reliably
ACM Transactions on Programming Languages and Systems (TOPLAS), 39
(2013)
AutomaticDetection of Per - formanceDeviations in theLoadTesting ofLargeScale Systems
(2008)
Apache JMeter: A Practical Beginner’s Guide to Automated Testing and PerformanceMeasurement for YourWebsites
(2016)
Designing for Scalability with Erlang/OTP: Implementing Robust, Fault-Tolerant Systems
P. Bagwell (2001)
Ideal Hash Trees
P. Trinder, Natalia Chechina, N. Papaspyrou, Konstantinos Sagonas, S. Thompson, Stephen Adams, Stavros Aronis, Robert Baker, Eva Bihari, Olivier Boudeville, Francesco Cesarini, M. Stefano, Sverker Eriksson, Viktoria Fordós, A. Ghaffari, Aggelos Giantsios, Rickard Green, Csaba Hoch, David Klaftenegger, Huiqing Li, Kenneth Lundin, K. Mackenzie, Katerina Roukounaki, Yiannis Tsiouris, Kjell Winblad (2017)
Scaling Reliably: Improving the Scalability of the Erlang Distributed Actor Platform
F. Huebner, K. Meier-Hellstern, P. Reeser (2001)
Performance Testing for IP Services and Systems
C. Ho, L. Williams, A. Antón (2007)
Improving Performance Requirements Specifications from Field Failure Reports
15th IEEE International Requirements Engineering Conference (RE 2007)
Philipp Haller (2012)
On the integration of the actor model in mainstream technologies: the scala perspective
M. Dahm (1999)
Byte Code Engineering
G. Kiczales, Erik Hilsdale, Jim Hugunin, Mik Kersten, Jeffrey Palm, W. Griswold (2001)
An Overview of AspectJ
Andrea Rosà, L. Chen, Walter Binder (2016)
Profiling actor utilization and communication in Akka
Proceedings of the 15th International Workshop on Erlang
A. Andersson (1999)
General Balanced Trees
J. Algorithms, 30
E. Bruneton, R. Lenglet, T. Coupaye (2002)
ASM: a code manipulation tool to implement adaptable systems
L. Aceto, D. Attard, Adrian Francalanza, A. Ingólfsdóttir (2021)
On Benchmarking for Concurrent Runtime Verification
Fundamental Approaches to Software Engineering, 12649
Rusty Klophaus (2010)
Riak Core: building distributed applications without shared state
G. Agha (1985)
ACTORS - a model of concurrent computation in distributed systems
(2005)
Performance vs. Load vs. Stress Testing
A. Abdelmoamen, Dezhong Wang, Nadeem Jamali (2018)
Approaching Actor-Level Resource Control for Akka
Petr Stefan, Vojtech Horký, L. Bulej, P. Tůma (2017)
Unit Testing Performance in Java Projects: Are We There Yet?
Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering
Adams-Agnes Booth (2012)
Actors
The Classical Review, 62
D. Attard, L. Aceto, A. Achilleos, Adrian Francalanza, A. Ingólfsdóttir, K. Lehtinen (2021)
Better Late Than Never or: Verifying Asynchronous Components at Runtime
F. Cesarini, S. Thompson (2009)
Erlang Programming - A Concurrent Approach to Software Development
Philipp Haller, Martin Odersky (2009)
Scala Actors: Unifying thread-based and event-based programming
Theor. Comput. Sci., 410
Gianluca Stivan, Andrea Peruffo, Philipp Haller (2015)
Akka.js: towards a portable actor runtime environment
Proceedings of the 5th International Workshop on Programming Based on Actors, Agents, and Decentralized Control
G Agha (1986)
10.7551/mitpress/1086.001.0001
Actors: A Model of Concurrent Computation in Distributed Systems
(2017)
Profiling and Tracing for All with Xprof
L. Bulej, T. Bures, Vojtech Horký, Jaroslav Kotrc, L. Marek, Tomás Trojánek, P. Tůma (2017)
Unit testing performance with Stochastic Performance Logic
Automated Software Engineering, 24
A. Hoorn, J. Waller, W. Hasselbring (2012)
Kieker: a framework for application performance monitoring and dynamic software analysis
Stavros Aronis, N. Papaspyrou, Katerina Roukounaki, Konstantinos Sagonas, Yiannis Tsiouris, Ioannis Venetis (2012)
A scalability benchmark suite for Erlang/OTP
L. Marek, Y. Zheng, Danilo Ansaloni, Walter Binder, Zhengwei Qi, P. Tůma (2012)
DiSL: an extensible language for efficient and comprehensive dynamic program analysis
Michal Slaski, Wojciech Turek (2019)
Towards online profiling of Erlang systems
Proceedings of the 18th ACM SIGPLAN International Workshop on Erlang
(2007)
Incorporate Performance Testing in Test-Driven Development
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations
M. Ciolczyk, M. Wojakowski, M. Malawski (2018)
Tracing of large‐scale actor systems
Concurrency and Computation: Practice and Experience, 30

Publisher: Springer Journals
Copyright: Copyright © The Author(s) 2022
ISSN: 0178-2770
eISSN: 1432-0452
DOI: 10.1007/s00446-022-00429-7
Publisher site: See Article on Publisher Site

Abstract

The Erlang programming language is used to build concurrent, distributed, scalable and resilient systems. Every component of these systems has to be thoroughly tested not only for correctness, but also for performance. Performance analysis tools in the Erlang ecosystem, however, do not provide a sufﬁcient level of automation and insight needed to be integrated in modern tool chains. In this paper, we present PerformERL: an extendable performance testing framework that combines the repeatability of load testing tools with the details on how the resources are internally used typical of the performance monitoring tools. These features allow PerformERL to be integrated in the early stages of testing pipelines, providing users with a systematic approach to identifying performance issues. This paper introduces the PerformERL framework, focusing on its features, design and imposed monitoring overhead measured through both theoretical estimates and trial runs on systems in production. The uniqueness of the features offered by PerformERL, together with its usability and contained overhead prove that the framework can be a valuable resource in the development and maintenance of Erlang applications. Keywords Erlang · Distributed systems · Performance testing · Load testing · Performance monitoring 1 Introduction As discussed by Jiang and Hassan [21], ﬁxing these issues when in production becomes complicated and expensive. Erlang offers a set of features—such as share-nothing Several language agnostic tools are available to measure lightweight processes and asynchronous communication the throughput and latency of a system under test (SUT) by through message passing—making it the ideal program- simulating different loads and monitoring response times. ming language for building massively concurrent systems These tools—dubbed load testing tools—provide system us- [10]. Applications running inside the Erlang virtual machine ability metrics and enable the repeatability of the trial runs. (called the BEAM) use this concurrency model for distribu- But as they use an external observation point (black-box ap- tion, resilience and scalability [11]. But with the advent of proach), they are not informative on how resources are used new technologies—such as cloud computing, containeriza- inside the SUT. This testing approach can help detecting per- tion and orchestration—developers are not encouraged to be formance degradation, but provides little information over resource savvy in order to satisfy their scalability require- which component of the SUT is causing the degradation. ments. This approach implies that performance issues and Performance monitoring tools, on the other hand, can gather bottlenecks often go undetected during the development pro- detailed metrics about the resources used by the SUT, such as cess, only to be identiﬁed when the system is in production. memory consumption, CPU usage and I/O operations. Unfor- tunately, they do not provide an interface to generate load, as they are meant for the inspection of live production systems B Walter Cazzola and are manually added at a later stage of the development. cazzola@di.unimi.it This lack of support for writing automated and repeatable Francesco Cesarini performance tests means that performance monitoring tools francesco@erlang-solutions.com cannot easily be included as part of the testing pipeline in the Luca Tansini development stages. luca.tansini@studenti.unimi.it This paper proposes PerformERL,a performance test- Department of Computer Science, Università degli Studi di ing framework for Erlang that combines the two approaches. Milano, Milan, Italy The performance testing terminology and the distinction be- Erlang Solutions, London, United Kingdom 123 440 W. Cazzola et al. tween load testing and performance monitoring was ﬁrst outlined by Gheorghiu [14] and later reﬁned by Jiang and Hassan in their survey [21]. PerformERL enables program- mers to write a systematically repeatable suite of tests that stress test the SUT in the early stages of development and keep track of the performance of every component—in terms of resource utilization—as the codebase grows. PerformERL builds on top of the Erlang BEAM, copes with Erlang ecosystem and exploits the BEAM tracing in- frastructure. Its main contribution is to deﬁne an architecture and a methodology to enable the performance testing in the Erlang ecosystem. To the best of our knowledge, Per- formERL is the ﬁrst framework in the Erlang ecosystem that permits to programmatically exercise a SUT and gather detailed metrics about the performance of the SUT, how the resources are used by the SUT components and which component and/or resource usage is responsible of the per- formance degradation of the SUT. Such a contribution is achieved through the design of a speciﬁc architecture (de- tails in Sects. 2 and 3) and in some extensions to the tracing infrastructure in order to improve its applicability and perfor- mance (details in Sects. 3.4.1 and 3.4.2). Also the proposed architecture is general enough to be implemented in different ecosystems as explained in Sect. 3.6. The rest of this paper is organized as follows. Section 2 provides an overview of the main concepts and terminology of PerformERL. Section 3 describes the internal architec- ture of the framework and how it can be realized in the JVM ecosystem. Section 4 shows how PerformERL can be em- ployed and extended with some examples. In Sect. 5,some theoretical measurements and tests to study PerformERL overhead and performance are presented and their results are discussed. Sections 6 and 7 conclude the paper reviewing with related work and presenting our conclusions. 2 Overview PerformERL is a performance testing framework. Accord- ing to Jiang and Hassan [21], it is neither a load testing nor a performance monitoring tool, but a bit of both. It combines the repeatability of load testing with the visibility offered by a performance monitor. PerformERL should be used as any other testing tool: by writing a test suite dedicated, this time, to performance evaluation. The test ﬁles (sometimes Fig. 1 PerformERL test execution ﬂow also referred to as load generator ﬁles) written by the users implement callbacks (deﬁned in the load generator behavior, see Listing 1, details in Sect. 3.1), used by PerformERL 1 The target function patterns—a set of MFAs —identify to (i) exercise a speciﬁc execution of the SUT in which the the group of functions of the SUT that the user is inter- performance measurements will be gathered, (ii) generate the target function patterns, and (iii) set other conﬁguration An MFA is a tuple uniquely identifying an Erlang function through a parameters, such as size, name and duration of the test. module, a name and an arity. 123 PerformERL: a performance testing framework… 441 -module(performerl_load_generator). ested in monitoring for a speciﬁc test case. These will be -type run_info() :: term(). used as a starting point for the performance analysis made -type test_info() :: term(). by PerformERL. By exploiting Erlang tracing infrastruc- -type test_size() :: non_neg_integer(). -type trace_pattern() :: ture , PerformERL gathers data about the target functions { module()|'_', atom()|'_', non_neg_integer()|'_' }. themselves, most notably, the number of times they are called -callback get_test_name() -> string(). and their execution time. PerformERL also discovers any -callback test_setup() -> {ok, test_info()}. process in the SUT that makes use of the target functions and -callback setup_run(Size::test_size()) -> {run_started,[node()]}. gathers metrics on those processes, including memory usage -callback start_load( and reduction count. TestNodes::[node()], Size::test_size()) -> A PerformERL test starts when the user invokes the {load_started, run_info()} | {already_started, pid()}. framework providing a load generator ﬁle. Since the goal of -callback get_load_duration() -> pos_integer(). the performance test is to provide insights into the scalability -callback get_test_sizes() -> {atom(), [test_size()]}. of the SUT, every test is composed of multiple runs. Runs -callback stop_load(RunInfo::run_info()) -> are successive calls to the same load generation functions, {load_stopped, run_info()} | but with different values for the size parameter. The core task {error, not_started}. of each run is to exercise the SUT by generating a computa- -callback teardown_run(RunInfo::run_info()) -> run_ended. -callback test_teardown(TestInfo::test_info()) -> ok. tion load—called workload—for the monitored application -callback get_trace_patterns() -> [trace_pattern()]. proportional to the given size parameter. Finally, when all the test runs have been completed, PerformERL produces its output as a collection of HTML ﬁles with charts and ta- Listing 1: The load generator behavior bles presenting the gathered results. Note that PerformERL does not target any speciﬁc scalability dimension but it aims test ﬁles (comp. ➊)usedby PerformERL must implement to be ﬂexible enough to allow the monitoring of any of them. the performerl_load_generator behavior shown The meaning of the size parameter depends on what the users in Listing 1. would like to measure. For example, size can be the number To have different setup and tear down functions per test of requests if we are interested in how the response time of a and per run enables the user to have more control over the web server scales with the growth of the number of requests generation of the test environment. The test_setup func- or it can be the number of entries in a database when we are tion is only called once at the beginning of the test (step ➁). interested in how the database size scales with the growth of It can be used to start external services that are not directly the number of its entries. Figure 1 summarizes the details of involved in the performance test but are needed during the a test execution ﬂow in the PerformERL framework. load generation steps or to perform operations that only need to be executed once during the test. In the run setup (step ➂) and tear down (step ➈), on the other hand, the user should 3 PerformErl under the hood take care of the actions that have to be done before and after each run, typically starting and stopping the SUT, so that ev- In this section, the different components of the framework ery run will begin with the SUT in the same fresh state. The will be described. Fig. 2 shows the components of Per- return value of the setup_run function must include the formERL and how they interact with the test ﬁle provided by identiﬁers of the nodes in which the SUT is running; these the user. In the following sections, white circled numbers— nodes will be referred to as test nodes (comp. ➌). such as (step ➀)—refer to steps of Fig. 1, whereas black The start_load function (step ➄) contains the code circled numbers—such as (comp. ➊)—refer to components to stress the SUT to obtain its performance data. The of Fig. 2. stop_load function (step ➅) stops any long-running op- eration initiated by its counterpart. The get_test_name 3.1 The load generator behavior and get_trace_patterns functions are not explicitly used in Fig. 1 because they provide conﬁguration parame- The only ﬁle that users have to write in order to imple- ters for a test, but do not affect the execution ﬂow directly. ment a test case is a load generator—i.e., a test ﬁle. The They return the custom test name and the MFA of the target functions respectively. The BEAM provides a powerful set of tools for the introspection of events related to functions, processes and message passing that go by 4 Erlang behaviors—as object oriented interfaces—deﬁne a set of call- the name of Erlang tracing infrastructure. back functions that should be exported by any module implementing The reduction is a counter per process that is normally incremented such behaviors. Failing to implement any of these callbacks generates by one for each function call. a compiler warning. 123 442 W. Cazzola et al. instrumented by injecting the modules needed for the perfor- mance monitoring (step ➃). The injected modules implement the tracer agent (TA), processes discoverer agent (PDA) and metrics collector agent (MCA) (comp. ➍): there will be an instance of each agent on every Erlang node. Once the agents are started, the function load_gen:start_load is called (step ➄). PerformERL will wait for the load gen- eration timeout to expire. The timeout is set by the function load_gen:get_load_duration, and its value must be large enough to enable the SUT to react to the generated load. Finally, data from the PDA and MCA will be gathered (step ➆) and the run will be effectively ended. The only re- maining step before cleaning up and stopping the test nodes is to execute the impact benchmark (step ➇), and to use its results to reﬁne the performance data. 3.3 The tracer agent TA is the ﬁrst agent started on the nodes running the SUT. The ﬁrst purpose of this agent is to use call time tracing to measure the number of calls and the execution time of the target MFA patterns. Call time tracing, enabled by the trace ﬂag call_time, is a feature of the Erlang tracing infrastructure that, for every traced MFA, records on a per process basis how many times the function was called and how much time was spent executing the function. Users can Fig. 2 PerformERL components interaction refer to this data with the function erlang:trace_info. Call time tracing does not require any message passing, as The remaining test components are predeﬁned in Per- it only updates some counters maintained by the BEAM. formERL and do not need to be customized by the user. In The other purpose of TA is to interact with the Erlang tracing infrastructure and to track any process—apart from the Per- Sect. 4.2 we will discuss how PerformERL functionality can be extended. formERL agents—that use the tracing primitives during the tests. By doing this, PerformERL is aware of the context in which the tests are executed and it can work, to a certain 3.2 The performERL module extent, even if the tracing infrastructure is already in use by the SUT. This is required to keep overheads under control, The performerl module (comp. ➋) provides the entry as the BEAM only allows one tracer agent per process. point for every test execution. It contains a main function that In PerformERL, since it is unknown who will call the loads the test ﬁle, sets up the global test environment common monitored functions, every process in the SUT has to be to all runs, and then starts a run for each user-speciﬁed size traced. This could be accomplished by the erlang:trace (step ➉). Once all the runs have been completed, it takes care function, but to tolerate the need of a SUT to use the trac- of tearing down the common environment and generates the ing infrastructure, PerformERL has to employ a more 11 12 output (steps and ). sophisticated approach: the Erlang meta-tracing facility. The execution of a single run can be summarized in the Meta-tracing is applied to an MFA pattern, and it traces following steps, also displayed in Fig. 1, where load_gen the calls made by any process to the functions selected by is the name of the test ﬁle provided by the user. First, the such MFA pattern, without explicitly tracing the caller. To load_gen:setup_run callback is executed (step ➂), be bound to the MFAs enables a ﬁner tracing mechanism which deploys the SUT on a set of Erlang nodes (comp. ➌) that allows more tracer agents per process, making Per- whose identiﬁers are returned. The Erlang nodes are then formERL tolerant to the presence of other tracers. Note that, a SUT using the tracing facility can be observed by Per- A Erlang node, node for short, is an instance of the BEAM. Several formERL thanks to the adoption of the meta-tracer. But, a processes run on each node. Each process can communicate both with SUT that uses the meta-tracing facility can not be observed processes running on the same node and with processes running on other nodes even over the Internet. because one meta-tracer can be associated to one process. 123 PerformERL: a performance testing framework… 443 Fortunately, this limitation has a negligible impact on the diate code interpreted by the BEAM more efﬁciently than applicability of PerformERL because the meta-tracing fa- the corresponding function call. cility is less frequently used than the standard tracing one. PerformERL uses match speciﬁcations to limit the set of The TA is started before the load_gen:start_load processes sending a message to the PDA to those who have function is called, and sets itself as the tracer for all not been discovered yet; this is equivalent to disabling the the other processes in the VM. The MFA patterns to tracing facility for the other processes, reducing overheads. trace are those speciﬁed by the user with the function The list of known active processes is encoded as a balanced load_gen:get_trace_patterns. Then TA sets itself binary search tree sorted on the PIDs, translated into a match as the meta-tracer for the tracing built-in functions in order speciﬁcation with short-circuited boolean operators. The list to detect if the SUT is making use of the tracing infrastructure of active processes is kept updated by removing those that and react accordingly. TA also sets itself as the meta-tracer terminate their execution. The match speciﬁcation is rebuilt for the erlang:load_module function, which is respon- whenever the list is updated. The cost of executing the match sible for loading a module into the BEAM. This permits to speciﬁcation against the PID of a target function caller is log- monitor the calls to the functions described by the MFA pat- arithmic in the number of processes, because of the balancing terns in dynamically loaded modules. These would otherwise of the binary tree structure. be missed because the call time tracing feature is applied to the MFA patterns when TA is started. If this happens be- 3.4.2 The Custom Meta-Tracer fore the workload triggering the dynamic module loading, the dynamically loaded module containing the speciﬁc MFA Meta-tracing is a powerful feature of the BEAM, but it would not be traced. In other words, TA can detect when a is less customizable compared to regular tracing. With the module containing some user-deﬁned MFA patterns is being regular tracing, the user can specify a number of ﬂags to al- dynamically loaded and promptly activate call time tracing ter the format of the generated trace messages. These ﬂags for those. are unavailable when using meta-tracing. In particular, the arity ﬂag—if available—would ease PDA implementa- 3.4 The processes discoverer agent tion because it forces the trace message to contain the arity of the called function rather than the full list of its arguments. PDA tracks those processes that—at any point in their Since sending a message implies the copying of its data, send- lifetime—use the monitored MFA patterns. PDA is started ing trace messages containing only the number of arguments after TA and depends on it for the detection of newly loaded instead of the arguments themselves would signiﬁcantly de- modules. PDA also uses the tracing infrastructure and it crease the overhead of the meta-tracing. is where the most sophisticated tracing techniques are em- Even though meta-tracing cannot be customized, it is pos- ployed to quickly discover the processes with a low overhead. sible to provide a tracer module when setting a meta-tracer. The approach is simple: PDA is notiﬁed about a process The tracing infrastructure allows the user to provide a custom 7 8 presence with a tracing message, stores its PID and starts module , composed of an Erlang stub and a NIF imple- monitoring it as soon as it calls a function matching a user- mentation, to replace part of the back-end of the tracing deﬁned MFA pattern. Then PDA immediately stops tracing infrastructure. It is therefore possible to code a custom tracer the process to reduce the overhead on the SUT—details in that implements the arity ﬂag and further reduces the over- Sect. 3.4.1. Notice that, because of the meta-tracing, the set ´ head. Slaski and Turek [30] demonstrated the efﬁciency and of traced MFA patterns is limited to user-deﬁned ones, but potential of custom tracer modules. the space of traced processes is the whole BEAM runtime. 3.5 The metrics collector agent 3.4.1 Match Speciﬁcations MCA is responsible for polling PDA for active processes The Erlang tracing primitive erlang:trace_pattern and gathering metrics—e.g., memory usage and reductions accepts as its second parameter an argument called match count—about them. The metrics are collected by default ev- speciﬁcation. Match speciﬁcations can be used to control ery 5 seconds, but this interval can be customized. and customize the tracing infrastructure. They are Erlang terms describing a low level program used to match pat- http://erlang.org/doc/man/erl_tracer.html—Erlang tracer behavior. terns, execute logical operations and call a limited set of A NIF (Native Implemented Function) is a function written in C in- commands. Match speciﬁcations are compiled into interme- stead of Erlang. They appear as Erlang functions to the caller, since they can be found in an host Erlang module, but their code is compiled into The built-in functions to access the Erlang tracing infrastructure are: a dynamically loadable shared object that has to be loaded at runtime erlang:trace and erlang:trace_pattern. by the host module. 123 444 W. Cazzola et al. The metrics collected by the MCA are sanitized to re- and thread dispatchers (corresponding to Erlang schedulers). move the tracing overhead from the call time data at the end Moreover, Akka threads are not mapped to JVM threads, of each run. The sanitation consists of removing the (aver- they are lightweight abstractions whose performances can age) overhead introduced by PerformERL tracing from the be monitored by a dedicated framework such as Perform- execution time of the monitored functions. PerformERL ERL. injects the impact benchmark module into the SUT when the The test orchestration functionality of PerformERL can run ends, when both the call time data and the number of dis- easily be reproduced in Akka since it provides all the nec- covered processes are available. This module measures the essary building blocks, such as nodes distribution, message average overhead of tracing—due to both the call time and the passing, remote code injection and remote procedure calls. processes discovery—on the monitored function calls. The The only fundamental component that Akka and the JVM do impact benchmark executes the monitored function 4,000 not offer out-of-the-box is an equivalent of the Erlang trac- times without any tracing enabled. Choosing 4,000 iterations ing infrastructure, which PerformERL’s approach heavily ensures that the process will not exceed its time slot and relies on. PerformERL’s approach requires an interface there will be no overhead due to context switching. The im- to gather metrics for the {actor, method} pair and pact benchmark then spawns a number of processes equal to also to perform dynamic discovery of actors calling spe- the highest number of active processes recorded during the ciﬁc methods, without manually altering the source code of run and activates both the call time tracing and the processes said methods. Ciołczyk et al. [12] describe Akka’s limita- discovery meta-tracing with a match speciﬁcation contain- tions wrt. message exchange tracing. Tracing support comes ing their PIDs. Each spawned process will execute the target from third-party libraries/frameworks. Kamon provides function 4,000 times. The benchmark module concludes by metric gathering functionality for Akka actors, but lacks taking the average execution time over all processes, sub- the possibility to send a message to an equivalent of the tracting the reference measurement to determine the impact PDA. As demonstrated by the Akka tracing tool [12] and of the tracing. Once the impact measurement is completed, AkkaProf [28], the “last missing mile” can be realized via the average latency multiplied by the number of calls is sub- code instrumentation—either aspect-based [23,26] or based tracted from the call time tracing data of each monitored on bytecode instrumentation [8,13]. These techniques permit function. to weave/inject the tracing code to the tested methods (equiv- alent to PerformERL target MFAs), and to gather and pass information about the monitored method calls to the equiva- 3.6 PerformERL in different ecosystems lent of the PDA to perform dynamic actor discovery. PerformERL’s approach is designed for performance test- ing of Erlang’s processes but it can be applied to any ecosys- 4 PerformERL in action tem supporting the actor model [2]. The actor model adopts a ﬁner concurrency unit (the actors) than processes/threads PerformERL is available as a rebar3 plugin and it is also whose implementation usually can not rely on the operating compliant to the escript interface. system mechanisms but requires an ad hoc virtual machine as the BEAM with its scheduling algorithms and commu- 4.1 A usage example: Wombat Plugins nications mechanisms. Erlang natively supports the actor model—Erlang’s processes are de facto actors—but there Wombat [33] is a performance monitoring framework for are several implementations for other ecosystems. The as- the BEAM. It works by injecting a number of agents into the sumption on the concurrency model is not so stringent. It monitored nodes to support its functionality. In this section, just simpliﬁes PerformERL’s transposition. we will show how PerformERL can be used to measure the Akka [17,18] is certainly the most relevant implementa- impact of the Wombat agents and their infrastructure on the tion of the actor model outside the Erlang ecosystem. Akka managed nodes. was born as part of the Scala standard library to then become Listing 2 shows a portion of the PerformERL load gen- the reference framework for all the languages supported by erator ﬁle to exercise the Wombat agents by spawning a the JVM but also Javascript [32]. Akka is heavily inspired by large number of processes in the monitored node. Per- Erlang and implements many of the fundamental actor model formERL monitors the processes and the function calls of primitives offered by the BEAM, such as supervision trees Wombat when it is monitoring another SUT whose num- ber of processes grows accordingly to the speciﬁcation in Erlang run-time system adopts a preemptive scheduler. Each process the load generator ﬁle. As explained in Sect. 3.1, this load receives a time slice measured by a reduction count before being pre- empted, where a reduction is a function call. Since OTP20 the number of allowed reductions is 4,000. https://kamon.io/docs/latest/instrumentation/akka/ 123 PerformERL: a performance testing framework… 445 -module(processes_load_gen). -behaviour(performerl_load_generator). test_setup() -> ok = wombat_lib:ensure_wombat_started(), {ok, []}. get_test_sizes() -> {number_of_processes, [65536,131072,262144,524288,1048576]}. setup_run(Size) -> Node = 'processes@127.0.0.1', StartCmd = "erl -detached -name "++ atom_to_list(Node)++ " -setcookie "++ atom_to_list(erlang:get_cookie())++ "+P"++integer_to_list(Size), [] = os:cmd(StartCmd), % omitted: waiting for the node to come online ok = performerl_lib:inject_mod(?MODULE, Node), {run_started,[Node]}. start_load([Node], Size) -> {node_added, WoNodeId} = wombat_lib:add_node_to_wombat(Node, atom_to_list(erlang:get_cookie())), Pids = rpc:call(Node,?MODULE,spawn_processes,[Size]), {load_started, [{node_info, {Node, WoNodeId}}, {pids, Pids}]}. %********** RPC target function **********% spawn_processes(Size) -> Num = (Size * 95) div 100, Pids = [spawn(fun() -> receive stop -> ok end end) || _ <- lists:seq(1, Num)]. Fig. 3 Results summary extract for the test in Listing 2 Listing 2: PerformERL test ﬁle for testing Wombat plugins shows an excerpt of the summary page for the test load in List- ing 2 . The table shows that PerformERL detects the same number of processes and functions for each run. This is cor- generator ﬁle implements the load_generator behavior rect because Wombat does not change. The graph, instead, to be used with PerformERL.The test_setup function shows that the number of processes that Wombat moni- checks that Wombat is up and running. The setup_run tors affect the call time of Wombat’s functions (information function spawns a node with a system limit for the number of monitored by PerformERL). Then, we can conclude that processes set to the run size parameter. It injects the test mod- the function draining more resources (execution time) when ule itself into the monitored node to enable the methods of the number of processes monitored by Wombat grows the test ﬁle to be called on the test node, as it happens for the is plugin_builtin_metrics:collect_metrics spawn_processes function. The start_load func- whereas all the other Wombat’s functions have an execution tion adds the test node to Wombat and remotely calls—via time consumption independent of the number of processes the Erlang rpm module—its spawn_processes func- or with a negligible growth. Therefore any attempt of opti- tion. It will start a number of idle processes equal to the 95% mizing the execution time of Wombat should affect such a of the processes system limit. The choice of 95% permits function (that is part of the Wombat framework). to have a meaningful load, close to the saturation threshold, and still to be sure not to reach it provoking the test crashing. 4.2 How to extend PerformERL The maximum number of processes can be retrieved by the call erlang:system_info(process_limit).The PerformERL has been designed to be extendable. In this stop_load and teardown_run functions (not shown section, we will show how to create custom agents and collect in Listing 2 ) send a stop message to the spawned processes custom metrics that will cooperate with the default Per- and take care of removing the test node from Wombat and formERL components. shutting it down, respectively. After all test runs have been completed, the user will ﬁnd the output of the framework in a folder named 4.2.1 Collecting Custom Metrics performerl_results with a sub-folder for the test execution containing a front_page.html ﬁle with a The ﬁrst proposed extension consists of adding a new metric summary result data for the test and more detailed ﬁles for to those collected by the MCA. The function each detected function and each discovered process. Figure 3 performerl_mca:collect_metrics combines the 123 446 W. Cazzola et al. -module(performerl_custom_agent). The custom agents modules are provided to Perform- % functions called locally in the PerformErl module ERL as a comma separated list of paths with the com- %****************************************************** mand line argument --custom_agents. PerformERL, -callback get_config(TestNode::node(), LoadModule::module(), through the function compile_custom_agents, parses TestSize::test_size()) -> map(). the comma separated list of module paths and compiles -callback process_results(TestNode::node(), the agents with the binary option. This will not pro- Results::term()) -> ProcessedResults::term(). duce the .beam ﬁle, but a binary which is loaded in % functions called remotely in the test nodes %****************************************************** the local node and injected in the test nodes using the -callback start(Config::map()) -> inject_custom_agents function. The start function {ok, Pid::pid()} | ignore | {error, Error::term()}. -callback get_results_and_stop() -> {ok, Results::term()}. of each custom agent loads the conﬁguration parameters— taken from a combination of the test node name, the test ﬁle and the size of the current run—and remotely starts the agent Listing 3: The Erlang behavior for custom agents in all test nodes with the appropriate conﬁguration. The cus- tom agents are started after the standard ones, allowing users metrics collected across runs: each metric is a tuple to rely on the functionality and services offered by the stan- {metric_name, MetricValue}. To support a new dard PerformERL agents in their custom ones. For the same metric we have to add how it is calculated and its result to reason, they are stopped before the standard agents running the list of the collected metrics. As an example, let us track on the same node. the message queue length of the discovered processes. This As a proof-of-concept, we implement a custom agent is done by adding that checks the health of the SUT during the performance tests by periodically monitoring some invariant properties. QLen = erlang:process_info(Pid, The custom invariant_checker_agent implementa- message_queue_len) tion is shown in Listing 4 . The agent implements both the to the performerl_mca:collect_metrics function performerl_custom_agent and the gen_server QLen contains the result as {message_queue_len, behaviors. The generic server functionality is used to im- Value} and it is returned together with the other collected plement the agents starting and stopping functions, as well metrics. Similarly, theperformerl_html_front_page as the periodic invariant checks. In the get_config func- module—that would present the collected metrics to the tion, the parameters for the agent—i.e., the list of invariants user—will be accommodated to present the new metric as to check and the interval between two consecutive checks— well. are taken from the load generator test ﬁle that implements an To add a new metric to MCA can be realized by a callback additional function (get_invariants)tobeusedwith whereas to accommodate the way it will be displayed has to this custom agent. The process_results function is be done by editing the functions in charge of the visualization where the data produced by the agent during the run will because of the well-known problems about the automatic be analyzed. In the example, the data are processed by the organization of data visualization. In spite of this, it should process_results0 function (not shown for brevity) and be evident that the effort required to add a custom metric is then the results of the invariants checks are printed to the limited. console. Processed data are ﬁnally returned to the caller— the performerl module, that will include them in the 4.2.2 Adding a Custom Agent run results. The test load ﬁle has to deﬁne and specify (via the get_invariants function) the invariants that Per- Adding a custom agent is another extension which can be ap- formERL has to check. Listing 5 shows how a different set plied to PerformERL. The framework provides a speciﬁc of invariants can be speciﬁed for each node involved in a behavior,performerl_custom_agents, allowing cus- test. The ﬁrst function clause deﬁnes an invariant about the tom agents to be started. The agents are Erlang modules maximum size for the tables in the database of the ﬁrst node implementing the four callbacks deﬁned by the behavior and provides a threshold value proportional to the size of the (Listing 3) . current run. The second clause deﬁnes the invariants for the The functions get_config and process_results second node, in which a web server is executed: the ﬁrst one are called locally by the PerformERL module to pass the is to check that the web server is online at all times, and the second one checks the length of the request queue against a parameters of the start function to the agent and process- ing the results after each run ends. The functions start and threshold value. The last invariant is common to all nodes and sets a threshold value of 1GB for the entire node memory. get_results_and_stop are called remotely in the test nodes via RPC from the performerl module and simply At the moment, there is no interface in PerformERL to start and stop the custom agent retrieving the results. automatically generate HTML ﬁles from the data produced 123 PerformERL: a performance testing framework… 447 -module(invariant_checker_agent). get_invariants('db_node@127.0.0.1', RunSize) -> -behaviour(gen_server). [{db_tables_check, -behaviour(performerl_custom_agent). {dm_module, get_tables_size, [max]}, '=<', 32*RunSize}, % performerl_custom_agent callback get_memory_invariant()]; start(Config) -> get_invariants('web_server_node@127.0.0.1', RunSize) -> InitState = #state{ [{web_server_online_check, check_interval = maps:get(check_interval,Config), {web_server, get_info, [status]}, invariants = maps:get(invariants, Config) '==', 'up_and_running'}, }, {web_server_queue_check, gen_server:start({local,?MODULE},?MODULE,InitState,[]). {web_server, get_info, [request_queue_length]}, get_results_and_stop() -> '=<', 100*RunSize}, {ok, ResHist} = get_memory_invariant()]. gen_server:call(?MODULE, get_results, infinity), get_memory_invariant() -> gen_server:stop(?MODULE), {memory_check, {ok, lists:reverse(ResHist)}. {erlang, memory, [total]}, get_config(TestNode, LoadModule, TestSize) -> '=<', 1024*1024*1024}. #{ check_interval => round(LoadModule:get_load_duration()/100), invariants => Listing 5: Examples of get_invariants functions LoadModule:get_invariants(TestNode,TestSize)}. process_results(TestNode, ResHist) -> {TotalChecks,InvResList} = process_results0(ResHist), io:format("Invariants were tested ˜p times on " 5 Evaluation "test node:˜p˜n",[TotalChecks, TestNode]), lists:foreach( fun({InvName,V}) -> In this section, PerformERL overhead and performance are io:format(" -invariant ˜p was violated " analyzed. All the presented tests were carried out on a 64-bit "˜p times˜n",[InvName, V]) end, InvResList). laptop equipped with an 8-core Intel i7@2.50GHz CPU and % gen_server callbacks 8GB of RAM running Erlang/OTP version 20.3 on Linux. init(InitState=#state{check_interval = Interval}) -> erlang:send_after(Interval,self(),check_invariants), Any considered SUT has to run on the BEAM and can be {ok, InitState}. composed of multiple types of nodes distributed across mul- handle_call(get_results,_From,State) -> tiple machines. {reply, {ok, State#state.history}, State}. handle_info(check_invariants,State#state{history=Hist) -> Res = check_invariants(State#state.invariants), 5.1 Memory footprint of the agents erlang:send_after(State#state.check_interval, self(), check_invariants), {noreply, State#state{history=[Res|Hist]}}; Running out of memory is one of the few ways in which %omitted: non relevant gen_server callbacks an instance of the BEAM can be crashed. It is fundamental % internal functions that the memory footprint of injected agents is minimal. In check_invariants(InvList) -> lists:map( our evaluation, we calculate the upper bound of the Per- fun({Name, MFA, Op, DefValue}) -> formERL memory consumption. All reported calculations {Name, check_invariant(MFA, Op, DefValue)} end, InvList). are based on the ofﬁcial Erlang efﬁciency guide , where check_invariant({M,F,Args}, Op, DefValue) -> the reference architecture is 64-bit and every memory word Value = erlang:apply(M,F,Args), requires 8 bytes. case test(Value, Op, DefValue) of true -> ok; The TA does not keep any relevant data structures other false -> {violated, Value} than a list of processes using the Erlang tracing infrastructure end. alongside PerformERL. Every entry of this list contains a test(A, '==', B) -> A == B; test(A, '=<', B) -> A =< B; tuple of the form test(A, '>=', B) -> A >= B. Listing 4: Custom invariant_checker_agent TA consumes 12 memory words for each traced MFA by the custom agents, so the users will need to manually pattern plus 1 word for the pointer to the list and 1 word modify the performerl_html_front_page in order for each entry in the list. In total, TA consumes 13n + 1 to present it—similarly to what has been done for displaying words of memory where n is the number of different traced the custom metrics in Sect. 4.2.1. Future developments of MFA patterns. We can conclude that TA will never consume PerformERL will include an overhaul of the code generat- excessive amounts of memory, as it would require the SUT ing the output HTML ﬁles to adopt a more modular approach and to offer a behavior with set of callbacks to support seam- less extensions to the data visualization. http://erlang.org/doc/efﬁciency_guide/advanced.html. 123 448 W. Cazzola et al. to trace about 10,000 different MFA patterns to consume a single MB of memory. The PDA keeps two data structures: a map mapping PIDs to the names of the discovered processes and a gb_sets containing the PIDs of the active processes. The number of discovered processes is the upper bound for the active pro- cesses during the execution of a PerformERL test. The map consumes 11 words per entry in the worst case. Its in- Fig. 4 Average overhead on a function call introduced by the trac- ternal structure is implemented as a hash array mapped trie ing mechanisms. Four conﬁgurations for the tracing mechanism are considered: (1) meta-tracing only enabled (2) call time tracing, (3) (HAMT) [6] when larger than 32 entries. According to the meta-tracing with tree match speciﬁcation and (4) meta-tracing with efﬁciency guide, a HAMT consumes n · f memory words tree match speciﬁcation and calling processes discovering. The x-axis plus the cost of all entries where n is the number of entries (in logarithmic scale) reports number of processes spawned and the and f is a sparsity factor that can vary between 1.6 and 1.8 y-axis the average overhead over 4,000 calls to a dummy function due to the probabilistic nature of the HAMT. Considering the worst case scenario, we have a total memory consumption of 1.8n + 9n = 10.8n words where n is the number of entries. The gb_sets entries are the PIDs of the active processes which only take 1 memory word each. The data structure is based on general balanced trees [3] and is represented in Erlang by a tuple with two elements: the number of en- tries and a tree structure with nodes of the form {Value, LeftChild, RightChild} where the empty child is Such a list roughly consumes: 15n+4 words where n is the represented by the nil. The entire structure consumes 2 number of active processes. To collect more metrics would words for the outer tuple, 1 word for the number of entries, only add the cost for their values representation. Assuming 3 words for the internal nodes with two children, 4 for those a standard 5 seconds interval between metrics collections, with only one child, and 5 words for the leaves. Since the which gives 12 collections per minute, and 10,000 active gb_sets is a complete binary tree, if n is the number of processes, this structure will grow by approximately 1.8MB entries there will be: every minute. The memory consumption of all the agents put together is – (n + 1)/2 leaves (5 words each) roughly: – n/2 internal nodes (3 words each) 13p + 14.8d + (4 + 15d)c ∼ O(p + dc) Note that at most one internal node has only one child (when memory words, where n is even) by deﬁnition of complete binary tree. Summing it all up, the cost for the gb_sets of n active processes is: – p is the number of MFA patterns traced by the SUT 3 + 5 ∗(n + 1)/2+ 3 ∗n/2+ n mod 2 – d is the number of processes discovered (also used as an upper bound for the active processes) which is roughly 4n memory words. – c is how many times the metrics are collected in a run The MCA holds in memory a list with the history of the (which depends on the duration and the metrics collection collected metrics, so it naturally grows with time. Assuming, frequency). without loss of generality, that only the default memory and reduction metrics are collected, every entry of the list will be It can be seen that the memory consumed by PerformERL of the form agents is linear (O(p + dc)) and depends on dimensions that the users can predict and control. 123 PerformERL: a performance testing framework… 449 5.2 Overhead on monitored functions In this section, we will explore the overhead Perform- ERL’s agents add to the monitored functions. As explained in Sect. 3, the tracing of the MFA is the main culprit, if not unique, of the overhead introduced by PerformERL. Therefore, our experiment will focus on the average over- head PerformERL adds on the execution time of a dummy function called a ﬁxed number of times (1,024 in our ex- periment) in an environment with an increasing number of processes (the size parameter) and with different tracing con- ﬁgurations activated. The idle processes—i.e., those that are not calling the dummy function—do not impact the call time directly but the resources used by the tracing infrastructure. Fig. 5 Comparing the average overhead due to the standard meta- It could seem counter-intuitive that we keep the number of tracing and with the arity extension. The x-axis (in logarithmic scale) reports the number of integers passed to the calls. The y-axis reports the calls ﬁxed when the total number of processes grows but the average overhead (in microseconds) over 100,000 calls to a dummy this would permit to monitor how the tracing facility impacts function the call time. The considered tracing conﬁgurations are: tracing mechanism enabled. Figure 4 shows the results of the 1. meta-tracing enabled without any match speciﬁcations experiment using a logarithmic scale on the x-axis. It demon- which always causes the caller to send a message. Per- strates that the increasing number of processes only affects formERL does not use this conﬁguration, but it provides the tracing techniques (3) and (4). It can also be seen that a reference to compare with the other techniques; the growth is logarithmic, which conﬁrms the theory behind 2. call time tracing, which does not require any message the tree match speciﬁcation presented in Sect. 3.4.1. The call exchanging but only updates some counters inside the time tracing conﬁguration (2) also shows a slight overhead BEAM. This is used by the TA. increase for larger numbers of processes. This is likely due 3. meta-tracing with the tree match speciﬁcation described to the performance of the data structures internally used by in Sect. 3.4.1 and the calling processes identiﬁers al- the BEAM to store the counters. The results show that the ready present in the tree. This case represents the already techniques employed for the process discovery cause an over- discovered processes calling a function and requires no head that is between 1.5 and 2 times higher than a plain usage message exchanging; of meta-tracing but they allow PerformERL to prevent al- 4. meta-tracing with the tree match speciﬁcation and calling ready discovered processes from sending trace messages and processes identiﬁers not present in the tree, so a trace mes- avoid ﬂooding the PDA. The higher overhead is due to the sage will be sent. This case represents the processes not execution time of the match speciﬁcation and in the last con- yet discovered calling a monitored function. Messages ﬁguration (4) also to the custom meta-tracer module being are sent via the custom meta-tracer module described in activated to send a custom message. Sect. 3.4.2. A second experiment has been done to show the im- portance of the custom meta-tracer module introduced in For each tracing conﬁguration and for each value of size, Sect. 3.4.2. This experiment compares the average overhead size processes are spawned enabling the tracing facility for imposed by meta-tracing using the standard back-end (that a dummy function: the processes are assigned an ID from 0 sends a trace message containing the full list of arguments) to size-1. The spawned processes whose ID is a multiple of with meta-tracing using PerformERL custom meta-tracer size/1,024 (i.e., ID = 0 mod (size/1, 024)) are selected to call 12 module implementing the arity ﬂag. It measures the av- the dummy function 4,000 times, measuring the execution erage execution time of a traced dummy function called time with the timer:tc function. The average execution 100,000 times for each conﬁguration. Conﬁgurations dif- time for a single call to the dummy function is computed fer for a parameter called argument size that determines the when all the selected processes have executed the bench- length of the list of integers passed to the dummy function. mark. The overhead is determined by subtracting a reference Since sending a message requires to copy the data to be sent, value obtained executing the same benchmark without any passing large parameters to a monitored function causes an increase in the tracing overhead. The choice of 4,000 grants that the number of reductions will not trig- Figure 5 presents the results of the tests. For small ar- ger the scheduling algorithm avoiding the overhead due to the context switching. guments, the custom meta-tracer causes a slightly higher 123 450 W. Cazzola et al. overhead compared to the standard back-end because it needs to access a dynamically loaded shared library in addition to the BEAM tracing infrastructure. It can be seen that the over- head starts to diverge for arguments larger than a list of 64 integers: up to 100 times for a list of 16,384 integers which is not an unlikely size of arguments for an Erlang function call. In fact, the custom meta-tracer module acts as failsafe for the standard back-end when a process calls a monitored function with a very large argument. In this scenario, two undesirable things can occur: the process slows down due to the copying of the arguments and the PDA runs out of memory if too many of these messages are sent to it. Fig. 6 Average overhead on cowboy response time 5.3 PerformERL in the real world To show that the overhead introduced by PerformERL monitoring and tracing facility to the running SUT is av- were averaged to minimize any spike due to external factors erage if not less than the one of the other frameworks, we beyond our control. measure it on a real case: cowboy , a well-known Erlang Figure 6 shows the results of the experiment, in terms of HTTP server, and compare it with the overhead of similar average response time for each workload. From the diagram, Erlang tools. In addition to PerformERL, the other chosen it can be noticed that all the considered tools caused a no- tools were Wombat [33], a proprietary performance mon- ticeable overhead when the number of requests is low. This 14 15 itoring framework, eprof and fprof, two proﬁling is likely due to the tools performing some initial operations, tools distributed with the Erlang standard library. Unfor- such as setting up their monitoring facility, that affects the tunately, to the best of our knowledge, there are no other ﬁrst few requests received by the server. In particular, Per- performance testing frameworks for the Erlang ecosystem formERL imposed a higher slowdown factor of 6 that settles and we have to compare PerformERL with performance on 2 with the growing of the workload. The initial peak can monitoring frameworks. To maintain the comparison fair, be attributed to PerformERL PDA which must discover all we are measuring a resource (the average response time) ob- the cowboy processes, populate its data structures, and up- servable without accessing to the SUT data structures: access date the match speciﬁcations before the ﬁrst request could that the performance monitoring framework do not have. The be served. The cusp corresponds to when the number of re- experiment will measure the server average response time to quests is such that their satisfaction allows to mitigate the a number of HTTP GET requests both when the monitoring initial overhead. At that point the slowdown can be attributed facility is active and when it is not. to the heavy usage of tracing done by PerformERL, as de- The conﬁguration of PerformERL used in this experi- scribed in Sect. 5.2. eprof shows a slowdown of around ment had the target MFA patterns matching all the functions 1.4 for every workload. This tool is only employing call inside the cowboy codebase. Wombat was used with a stan- time tracing which, as shown in the previous section, causes dard conﬁguration. eprof and fprof were set up to trace a smaller overhead on the monitored functions, hence the every process in the cowboy server node. For each tool, the slowdown factor is lower compared to PerformERL as ex- experiment was set up with ﬁve increasing amounts of HTTP pected.Wombat, similarly toPerformERL, causes a higher requests to measure the impact of the tools under different overhead in the monitored node in the ﬁrst few seconds after workloads. The requests are synchronous: a new request is its deployment due to the setting up of its plugin infrastruc- made when the results of the previous one are received. In ture. After that it can be seen that over time Wombat does not this way, each request is satisﬁed when cowboy receives it impose any overhead at all. fprof is the tool that showed and no time is spent in a waiting queue that would bias the the highest overhead in the experiment, with a constant av- ﬁnal measurements. For each of the described settings the erage slowdown factor of 5 across all workloads. This is due experiment was run 100 times and the results of each set up to the heavy use of the tracing infrastructure done by fprof which traces every function call made by the monitored pro- cesses and writes the trace message in a ﬁle that will later be Cowboy—Small, fast, modern HTTP server for Erlang/OTP: https:// github.com/ninenines/cowboy. analyzed to produce a detailed call graph reporting for each https://erlang.org/doc/man/eprof.html. function how much time was spent executing code local to https://erlang.org/doc/man/fprof.html. the function and in functions called by that one. 123 PerformERL: a performance testing framework… 451 The experiment shows that the overhead caused in the web the SUT (Sect. 3.3) as demonstrated by the experiment on server by the monitoring tools is proportional to the usage of Wombat reported in Sect. 4.1. Said that, PerformERL still the tracing infrastructure, after an initial startup time where has some limitations: for one, the SUT should not make use some tools, namely PerformERL and Wombat,haveto of meta-tracing. This is not an issue, as with existing Erlang setup their infrastructure which competes with the web server applications, it seems that the meta-tracing facility is under- for the scheduling, causing an increase in the slowdown. The rated and only used for troubleshooting live systems. usage of the tracing infrastructure depends on the features Another problem is the unloading (or reloading) of mod- that the tool offers regarding function calls. fprof provides ules containing target function patterns during the tests. If this more detailed information about function calls compared to happens, the call time proﬁling data will be lost. In future ver- the other tools and, for that reason, is the one with the highest sions, a periodic back-up of this data could be implemented overhead. PerformERL places in the experiment between at the cost of increased memory consumption, or a tracing- eprof and fprof and in fact, it provides the same infor- based mechanism monitoring the unloading and reloading of mation as the former regarding function calls, but it also uses modules could be used to detect the issue. tracing for the real-time discovery of processes, which is a feature that no other tool offers. Wombat is different from the other tools since it is meant for monitoring of live pro- 6 Related work duction systems and focuses more on system wide metrics rather than function calls, so it can afford to limit the usage of The idea and the need of promoting performance testing in the tracing infrastructure resulting in an overhead of almost the early stages of the development cycle, which is one of the zero, at least in a standard conﬁguration. guiding principles behind this work, has been pointed out by Woodside et al. [35]. Others, such as Johnson et al. [22], 5.4 Discussion suggested that performance testing should be incorporated in test driven development and that is indeed a goal that can PerformERL should be included in the testing pipeline of be achieved using PerformERL. a project and is not meant to be used in a production envi- In this section, we describe a few tools that are commonly ronment. This means that the primary goal of the framework used and share similar goals with PerformERL. The focus is to provide a thorough insight into the SUT whilst offering is on tools popular in the Erlang ecosystem but we will also compatibility with as many applications as possible, rather discuss the most akin approaches even if unrelated to the than achieving a low overhead. Nevertheless, the tests and BEAM. estimates presented in this section show that the users can predict the dimension of the overhead caused by Perform- 6.1 Performance monitoring tools ERL, both in terms of memory consumption and execution time overhead. Both these dimensions depend on the num- In this paragraph we present tools related to PerformERL ber of processes that the SUT spawns and how many of them that fall in the category of performance monitoring tools in PerformERL has to discover. accordance to Jiang and Hassan [21] terminology. In general, PDA and MCA provide useful information A standard Erlang release ships with tools like eprof and when there is a limited set of long-lived processes. On the fprof, that are built on top of the tracing infrastructure and other hand, trying to get information over a large number of provide information about function calls. A set of processes worker processes that execute for a very short time before and modules to trace can be speciﬁed to limit the overhead, terminating will not provide any useful insight other than however, the approach of these tools is basic and has been the number of such processes and the list of monitored func- improved in our framework to both reduce the impact on tions that they called, whilst degrading the performance of the the SUT and gather more meaningful results. Furthermore, agents. This is a limitation inherent to the design of the Per- their output is text based, which may result in a poor user formERL framework and the Erlang tracing infrastructure experience. More evolved tools, including PerformERL, itself, also discussed by Slaski and Turek [30]. To mitigate process the output to generate reports with plots and charts this issue, we are developing an extension to Perform- to better help the user understand the gathered data. ERL that will enable the possibility of disabling some of the Another tool already distributed with Erlang is the agents, at the cost of losing some of the standard features. Observer . Observer is an application that needs to The real challenge PerformERL had to face is to apply be plugged into one or more running Erlang nodes offering a performance testing on SUTs as Wombat [33] that actively graphical user interface to display information about the sys- need tracing to run without hindering their operation. In this respect, PerformERL uses and extends the meta-tracing Observer, a GUI tool for observing an Erlang system: facility to tolerate the use of the tracing infrastructure by http://erlang.org/doc/man/observer.html. 123 452 W. Cazzola et al. tem such as application supervision trees, processes memory Basho Bench is a benchmarking tool created to con- allocations and reductions, and ETS tables. While some of duct accurate and repeatable performance and stress tests the metrics gathered by this tool are similar to what Per- inside the Erlang environment. It was originally implemented formERL offers, the approach is different, as Observer is to test Riak [24] but can be extended by writing cus- meant for live monitoring of entire nodes activity, whereas tom probes in the form of Erlang modules. The approach PerformERL is used to write repeatable tests and can focus is indeed similar to the one used in PerformERL,but it on speciﬁc components of the SUT. focuses on two measures of performance—throughput and XProf [15] is a visual tracer and proﬁler focused on func- latency—related to network protocols and DB communica- tion calls and production safety. It achieves a low overhead tions. Basho Bench differs from PerformERL in the by only allowing the user to measure one function at a time sense that the former gives an overview of what the per- and gives detailed real-time information about the monitored formance of an entire system looks like from the outside, function execution time, arguments and return value. Its pur- while the latter provides insights into the performance of pose is mainly to be used to debug live production systems. the system’s components. Moreover, Basho Bench does Wombat [33] is a monitoring, operations and performance not support the concept of run that permits to execute the framework for the BEAM. It is supposed to be plugged into a same test with different loads. This is a crucial feature for a production system all the time and its features include gather- performance testing framework as PerformERL that must ing application- and VM-speciﬁc metrics and showing them monitor how the SUT behaves as the load increases. Similar in the GUI as well as sending threshold based alarms to the considerations can be done for BenchERL [4]aswell. system maintainer so that issues and potential crashes can Akka tracing tool [12] is a library to be used with Akka be prevented. The aim of Wombat is different from that of applications that permits to generate a trace graph of mes- PerformERL, as it is not a testing tool, even if both share the sages. It focuses on collecting metrics related to the messages idea of injecting agents into the monitored system to gather exchange. It is extendable and provides an interfaces to show metrics. the collected data. It shares a philosophy and an architecture Keiker [34]isa Wombat counterpart outside the BEAM similar to PerformERL without providing its insights on written in Java. It replaces the Erlang tracing infrastructure by the used resources/data structures. However, this is an ex- using aspect-oriented programming [23] to instrument code, tension whose support is envisionable since they already use but the users have to write the aspects, which requires to AspectJ to inject the code to trace the messages (as we sug- know AspectJ and an additional coding effort. gest in Sect. 3.6 for the PerformERL implementation on the JVM). 6.2 Load testing tools 6.3 Performance testing tools In this section we present the related tools that—because of In this section we will present the tools related to Per- their black-box approach to performance testing—we cate- formERL whose white-box approach we consider to be gorize under the name of load testing tools, in accordance to performance testing. Jiang and Hassan [21] terminology. erlperf is a collection of tools useful for Erlang Apache JMeter [16] and Tsung are widely used load proﬁling, tracing and memory analysis. It is mainly a per- testing tools. The former is written in Java and the latter formance monitoring tool but it offers a feature called is its Erlang counterpart. They share with our framework continuous benchmarking meant for scalability and perfor- the repeatability of the tests and the idea of running them mance inspection that allows the user to repeatedly run with increasing amounts of load but similarities stop there. tests and record benchmark code into test suites. This fea- Test conﬁgurations are speciﬁed via JSON-like ﬁles instead ture together with the collected proﬁling data suggest that of code and their goal is to measure the performance of erlperf could serve a purpose similar to PerformERL. web applications—or various network protocols in general— However, the characteristics that would make erlperf a under a large number of requests from an external point of performance testing tool are still in a rudimentary state and view by looking at response times. PerformERL,onthe no documentation is available to clearly understand their pur- other hand, provides information from the inside of the sys- pose and functionality. tem, showing how each component reacts to the load. detectEr tool suite [1,5] has some commonalities with PerformERL. They both target Erlang infrastructure, ETS tables are an efﬁcient in-memory database included with the Erlang virtual machine. Basho benchhttps://github.com/basho/basho_bench. 18 20 Tsung, a distributed load testing tool: http://tsung.erlang-projects. erlperf, a collection of tools useful for Erlang proﬁling, tracing org. and memory analysis: https://github.com/max-au/erlperf. 123 PerformERL: a performance testing framework… 453 they both rely on the SUT execution for their analysis and and sending them back to the AkkaProf logic agent (de both consider benchmarking and experiment reproducibil- facto implementing a sort of tracing facility). ity. Even if detectEr targets a post-deployment phase and runtime property validation. As PerformERL, detectEr relies on Erlang’s actor model and the authors [1] discussed 7 Conclusion and future developments how the approach can be realized in other languages with dif- ferent implementations of the actor model—with highlights This paper introduces PerformERL: a performance test- similar to those described in Sect. 3.6. Due to its nature, ing framework for the Erlang ecosystem. PerformERL can detectEr has a limited view on the runtime usage of the be used to monitor the performance of a SUT during its resources. To some extends, the two approaches complement execution or be included in its testing pipeline thanks toPer- each other. formERL interface for deﬁning load tests programmatically. Stefan et al. [31] conducted a survey on unit testing per- PerformERL can collect several kind of metrics both about formance in Java projects. From the survey, many tools SUT internals and its behavior and it can also be extended 21 22 emerged—such as JUnitPerf , JMH and JPL [9]— with new metrics. Throughout this paper we have investi- that through various techniques apply microbenchmarking gated PerformERL usability and visibility over the SUT, to portions of a Java application in the form of unit tests. highlighted its ﬂexibility demonstrating how it can be ex- This tools share with PerformERL the repeatability and tended to match the user needs and the overhead it imposes a systematic approach aimed at testing performance, so we over the SUT, showing both its strengths and weaknesses. consider them performance testing frameworks. However, One of PerformERL weak points is the module used to they are aimed at testing speciﬁc units of a Java system and visualize the results. Although it automatically shows the col- mostly focus on execution time only. lected data, it is quite rigid wrt. the possible customizations A different approach to performance testing in Java was of PerformERL forcing its manual extension to accom- proposed by Bhattacharyya and Amza [7]. They proposed a modate the visualization of new metrics. In future work, a tool, PReT, that tries to automatically detect any Java pro- more sophisticated approach could be adopted for the pre- cess in a system that is running a regression test and starts sentation of the test results that will ease the integration of to collect metrics on them. The tool employs machine learn- data produced by both custom agents and custom metrics. ing both to identify the processes running a speciﬁc test and Moreover, to increase the level of automation, future devel- to detect any anomalies in the collected measurements that opments could include an interface to provide performance could indicate a performance regression. The approach can requirements—in the form of threshold values for the col- deﬁnitely be considered performance testing but it differs lected metrics—in order to deﬁne a pass/fail criteria [19]. from PerformERL in the sense that they evaluate perfor- Alternative criteria could be the no-worse-than-before prin- mance measurements on tests already in place rather than ciple deﬁned by Huebner et al. [20] or the application of providing an interface to generate a workload. machine learning techniques as proposed by Malik et al. [25]. Moamen et al. [27] explored how to implement resource We are also considering to investigate how PerformERL control in Akka-based actor systems. Their proposals share could be integrated in the detectEr [5] tool chain. the general philosophy of PerformERL but are based on the manipulation of the basic mechanisms of the actor model: Acknowledgments This work was partly supported by the MUR project “T-LADIES” (PRIN 2020TL3X8X). The authors wish also to the spawning of the actors and the dispatch of the messages. thank the anonymous reviewers for their comments: they helped a lot The former permits to know the existence of an actor and in improving the quality of this work. then monitoring it since its spawning without the need of a PDA. The latter obviates to the need for a tracing facility. Funding Open access funding provided by Università degli Studi di Milano within the CRUI-CARE Agreement. These approaches are more invasive and cannot be used to do performance testing of systems that cannot be stopped. Open Access This article is licensed under a Creative Commons Attri- AkkaProf [28,29] provides an approach similar to the one bution 4.0 International License, which permits use, sharing, adaptation, proposed by Moamen et al. [27] but instead of instrumenting distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, pro- the way an actor is spawned AkkaProf dynamically instru- vide a link to the Creative Commons licence, and indicate if changes ments the actors when their classes are loaded in the JVM. were made. The images or other third party material in this arti- The injected code takes also care of collecting the metrics cle are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your in- tended use is not permitted by statutory regulation or exceeds the https://github.com/clarkware/junitperf. permitted use, you will need to obtain permission directly from the copy- Oracle Corporation, Java Microbenchmarking Harness: http:// right holder. To view a copy of this licence, visit http://creativecomm openjdk.java.net/projects/code-tools/jmh/. ons.org/licenses/by/4.0/. 123 454 W. Cazzola et al. References 21. Jiang, Z.M., Hassan, A.E.: A Survey on Load Testing of Large- Scale Software Systems. IEEE Trans. Softw. Eng. 41(11), 1091– 1118 (2015) 1. Aceto, L., Attard, D. P., Francalanza, A., Ingólfsdóttir, A.: On 22. Johnson, M.J., Ho, C.-W., Maximilien, E.M., Williams, L.: In- Benchmarking for Concurrent Runtime Veriﬁcation. In FASE’21, corporate Performance Testing in Test-Driven Development. IEEE LNCS 12649, pp. 3–23, Luxembourg City, Luxembourg, (2021). Software 24(3), 67–73 (2007) Springer 23. Kiczales, G., Hilsdale, E., Hugunin, J., Kersten, M., Palm, J., Gris- 2. Agha, G.: Actors: A Model of Concurrent Computation in Dis- wold, B.: An Overview of AspectJ. In ECOOP’01, LNCS 2072, tributed Systems. MIT Press, Cambridge (1986) pp. 327–353, Budapest, Hungary, (2001). Springer-Verlag 3. Andersson, A.: General Balanced Trees. J Algorithms 30(1), 1–18 24. Klophaus, R.: Riak Core: Building Distributed Applications with- (1999) out Shared State. In CUFP’10, pp. 14:1–14:1, Baltimore, Maryland, 4. Aronis, S., Papaspyrou, N., Roukounaki, K., Sagonas, K., Tsiouris, USA, (2010). ACM Y., Venetis, I.E.: A Scalability Benchmark Suite for Erlang/OTP. 25. Malik, H., Hemmati, H., Hassan, A.E.: Automatic Detection of Per- In Erlang’12, pp. 33–42, Copenhagen, Denmark, (2012). ACM formance Deviations in the Load Testing of Large Scale Systems. In 5. Attard, D.P., Aceto, L., Achilleos, A., Francalanza, A., Ingólfs- ICSE’13, pp. 1012–1021, San Francisco, CA, USA, (2013). IEEE dóttir, A., Lehtinen, K.: Better Late Than Never or: Verifying 26. Marek, L., Villazón, A., Zheng, Y., Ansaloni, D., Binder, W., Qi, Z.: Asynchronous Components at Runtime. In FORTE’21, LNCS DiSL: A Domain-speciﬁc Language for Bytecode Instrumentation. 12719, pp. 207–225, Valletta, Malta, (2021). Springer In AOSD’12, pages 239–250, Potsdam Germany, (2012). ACM 6. Bagwell, P.: Ideal Hash Trees. Technical report, École Polytech- 27. Moamen, A.A., Wang, D., Jamali, N.: Approaching Actor-Level nique Fédérale de Lausanne, Lausanne, Switzerland (2001) Resource Control for Akka. In JSSPP’18, LNCS 11332, pp. 127– 7. Bhattacharyya, A., Amza, C.: PReT: A Tool for Automatic Phase- 146, Vancouver, BC, Canada, (2018). Springer Based Regression Testing. In CloudCom’18, pp. 284–289, Nicosia, 28. Rosà, A., Chen, L.Y., Binder, W.: AkkaProf: A Proﬁler for Akka Cyprus, (2018). IEEE Actors in Parallel and Distributed Applications. In APLAS’16, 8. Bruneton, E., Lenglet, R., Coupaye, T.: ASM: A Code Manipu- LNCS 10017, pp. 139–147, Hanoi, Vietnam, (2016). Springer lation Tool to Implement Adaptable Systems. In: Adaptable and 29. Rosà, A., Chen, L.Y., Binder, W.: Proﬁling Actor Utilization and Extensible Component Systems, (2002) Communication in Akka. In Erlang’16, pp. 24–32, Nara, Japan, 9. Bulej, L., Bureš, T., Horký, V., Kotrc, ˇ J., Marek, L., Trojánek, T., (2016). ACM T˚uma, P.: Unit Testing Performance with Stochastic Performance 30. Slaski, M., Turek, W.: Towards Online Proﬁling of Erlang Systems. Logic. Automated Softw. Eng. 24, 139–187 (2017) In ERLANG’19, pages 13–17, Berlin, Germany, (2019). ACM 10. Cesarini, F., Thompson, S.J.: Erlang Programming: A Concurrent 31. Stefan, P., Horký, V., Bulej, L., Tuma, ˚ P.: Unit Testing Performance Approach to Software Development. O’Reilly, (2009) in Java Projects: Are We There Yet? In ICPE’17, pp. 401–412, 11. Cesarini, F., Vinoski, S.: Designing for Scalability with L’Aquila, Italy, (2017). ACM Erlang/OTP: Implementing Robust, Fault-Tolerant Systems. 32. Stivan, G., Peruffo, A., Haller, P.: Akka.js: Towards a Portable Ac- O’Really Media, (2016) tor Runtime Environment. In AGERE!’15, pp. 57–64, Pittsburgh, 12. Ciołczyk, M., Wojakowski, M., Malawski, M.: Tracing of Large- PA, USA, (2015). ACM Scale Actor Systems. Concurrency and Computation-Practice and 33. Trinder, P., Chechina, N., Papaspyrous, N., Sagonas, K., Thomp- Experience 30(22), e4637 (2018) son, S.J., Adams, S., Aronis, S., Baker, R., Bihari, E., Boudeville, 13. Dahm, M.: Byte Code Engineering. In Java-Informations-Tage, O., Cesarini, F., Di Stefano, M., Eriksson, S., Fördos, ˝ V., Ghaffari, 267–277, (1999) A., Giantsios, A., Green, R., Hoch, C., Klaftenegger, D., Li, H., 14. Gheorghiu, G.: Performance vs. Load vs. Stress Testing Lundin, K., MacKenzie, K., Roukounaki, K., Tsiouris, Y., Win- [Online]. http://agiletesting.blogspot.com/2005/02/performance- blad, K.: Scaling Reliably: Improving the Scalability of the Erlang vs-load-vs-stress-testing.html, (2005) Distributed Actor Platform. ACM Trans. Prog. Lang. Syst. 39(4), 15. Gömöri, P.: Proﬁling and Tracing for All with Xprof. In: Proceed- 17:1-17:46 (2017) ings of the Elixir Workshop London, London, United Kingdom, 34. van Hoorn, A., Waller, J., Hasselbring, W.: Kieker: A Framework (2017) for Application Performance Monitoring and Dynamic Software 16. Halili, E.H.: Apache JMeter: A Practical Beginner’s Guide to Au- Analysis. In ICPE’12, pp. 247–248, Boston, MA, USA, (2012). tomated Testing and Performance Measurement for Your Websites. ACM Packt Publishing, (2008) 35. Woodside, M., Franks, G., Petriu, D.C.: The Future of Software 17. Haller, P.: On the Integration of the Actor Model in Mainstream Performance Engineering. In FOSE’07, pp. 171–187, Minneapolis, Technologies: The Scala Perspective. In AGERE!’12’, pp. 1–6. MN, USA, (2007). IEEE ACM, (2012) 18. Haller, P., Odersky, M.: Scala Actors: Unifying Thread-Based and Event-Based Programming. Theoret. Comput. Sci. 410(2–3), 202– 220 (2009) Publisher’s Note Springer Nature remains neutral with regard to juris- 19. Ho, C.-W., Williams, L.A., Antón, A.I.: Improving Performance dictional claims in published maps and institutional afﬁliations. Requirements Speciﬁcations from Field Failure Reports. In RE’07, pp. 79–88, New Delhi, (2007). IEEE 20. Huebner, F., Meier-Hellstern, K., Reeser, P.: Performance Testing for IP Services and Systems. In GWPSED’00, LNCS 2047, pp. 283–299, Darmstadt, Germany, (2000). Springer

Journal

Distributed Computing – Springer Journals

Published: Oct 1, 2022

Keywords: Erlang; Distributed systems; Performance testing; Load testing; Performance monitoring

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

PerformERL: a performance testing framework for erlang

PerformERL: a performance testing framework for erlang

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

PerformERL: a performance testing framework for erlang

PerformERL: a performance testing framework for erlang

References (39)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies