AppDynamics Implementation Reference Guide
In this article
The following outlines good practices to follow when configuring AppDynamics out of the default instrumentation. We'll cover the following aspects:
- Business Transactions
- Data collectors (DCs)
- Service Endpoints
- Tiers and nodes
- Servers
- Remote services and databases
- Errors
- Health rules
- Anomaly detection
- User experience
- Pages and Ajax requests
- Synthetics
- Analytics
- Dashboards
- Agents
- Authentication
This should be used as a guide and not seen as an absolute. All environments are different, so a mere discussion around the differences will be healthy to challenge current assumptions.
Business Transactions
The Business Transaction (BT) is the fundamental configuration entity of AppDynamics. Business Transactions provide visibility into the critical paths and performance of application functionality. Defining Business Transactions effectively is critical to a good AppDynamics application architecture.
A BT is an aggregation of individual transactions (execution paths through the system) that makes sense from a performance monitoring standpoint. AppDynamics generates and collects key metrics for each BT: average response time, load, error rate, etc.
Each BT is based on a rule; the rules can be very precise (such as all web requests with a particular, fully specified URL, or specific Operation in WCF Service) or much broader (such as all web requests with URLs that begin with /foo/bar, or all operations in specific WCF service).
Get further information from AppDynamics documentation.
Business Transaction strategy
The goal of a good BT strategy is to focus on the business critical paths of the application. This helps to provide pertinent information while reducing noise. To get the most value from AppDynamics BT detection must be configured explicitly. A recommended Business Transaction strategy for WCF calls is an "All Calls" BT for each WCF service in at each tier, and then to pull out specific operations as standalone Business Transactions as needed.
Additionally, enable automatic Business Transaction detection for web sites and queues to discover and name pages, but then review and collapse some of the pages to an "All Calls" BT for those that are not important.
Identify proper Business Transactions
- For each tier, review the application with architects and developers and agree on the top five or ten Business Transactions.
- Define BTs for business-critical transactions. The focus should really be on these business-critical transactions. When your list of Business Transactions for a given Tier grows beyond 10-15, you should be considering whether or not each transaction defined is truly business critical or merely application important.
- Combine any similar Business Transactions into one if they exercise very similar overlapping code. For example, if you implement various types of searches on your site, making one Business Transaction named "Search" may be enough to troubleshoot effectively and yet reduce the number of Business Transactions. Another example is if your application has an admin UI. This is typically not business critical, but you still don't want to lose visibility. It might be sufficient to create one "Admin BT" which groups all transactions to an admin UI in one BT.
- Name your Business Transactions in a way that complies with the rest of the company's naming scheme. Names should use terminology that all other parts of the organization are familiar with, to enable them to troubleshoot when they receive alerts with this name.
There are both logical and technical limits on the number of Business Transactions. The load each additional Business Transaction puts on the controller determines the technical limit. More Business Transactions equate to more metrics that must be handled by the controller; each additional metric requires CPU, RAM and IO. More Business Transactions potentially results in a slower, less responsive controller.
The logical limit comes from the necessity to actively manage all Business Transactions. It is a considerable effort to manage thousands of Business Transactions. Too many constitute noise, add overhead (technical and organizational) and makes it harder to find relevant information.
Data collectors (DCs)
Data collectors provide context for specific transactions and help determine whether the payload of a transaction is causing problems. With data collectors configured, AppDynamics accesses information in the HTTP session or in the arguments of HTTP calls, WCF service inputs and outputs, return values or variables of the application code and displays the information in snapshots.
It is important to note that data collectors can be configured to collect sensitive information. This technology allows extracting information directly from objects. Developers usually need to enable a data collector when they are troubleshooting problems or trying to better understand problems in their code.
There are two types of data collectors: HTTP and Method Invocation (MIDC).
- HTTP data collectors can capture data from HTTP parameters, request attributes, cookies, session keys and headers.
- Method Invocation data collectors can capture method parameters and return values, the object itself and any property or results of some methods associated with the variable object.
Dos:
- Collect variables for the identification of customer ID, request ID, entity ID and other special data. This helps identify which customer snapshot belongs to and prepares you for potential future deployment of analytics for per-customer visibility.
- Ask developers which variables are helpful to troubleshoot major components of functionality.
- Carefully review how what you are collecting may be impacting code paths.
Don'ts:
- You must be careful to collect the variables in a way that would not change – or not change too much – your application behavior.
- Typically, collecting the value of the variable does not change the behavior of the application. However, if you collect a value from a method or a property getter that is backed by some code that doesn't just return a value, but executes some functions, the data collector changes the execution path.
- Collecting variables from methods that are executed frequently in a given Business Transaction is not a good idea. For example, call to look up something from the cache that is executed 100 times will result in 100 values being collected. While neat, it isn't usually helpful for troubleshooting, and it reduces the performance of the cache retrieval.
- Always work with application architects or developers who understand the application when deciding what values to capture.
- Test new data collectors in pre-production environment to see what they do before promoting to production. Due to sensitive data in production, developers should not be allowed to enable new DCs without good justification and approval.
Get further information from AppDynamics documentation.
Service Endpoints
In addition to Business Transactions, another way of viewing application performance of a particular service independently of business transactions are Service Endpoints. Service Endpoints provide key performance indicators such as calls per minute, average response time and errors in the context of a service. They omit business transaction context and downstream performance data.
Service Endpoints do not capture transaction snapshots, however they are linked to business transaction snapshots for those BTs that flow through it. Service Endpoints are configured similarly to Business Transactions but do not require as many resources. But to maintain optimal performance, Service Endpoint limits are applied at the Controller and Agent level.
Java application agents automatically detect Service Endpoints. .NET application agents do not detect Service Endpoints automatically. Service Endpoints must be manually configured using custom Service Endpoint detection rules.
Find further information on Service Endpoints with AppDynamics documentation.
Tiers and nodes
In AppDynamics, application components are mapped to applications, tiers and nodes. An AppDynamics application is a grouping of one or more of related business application modules that are closely related by business function. The tier is a cluster of nodes, all performing the same or similar work, usually a set of identical nodes.
A node in the AppDynamics model corresponds to an individual runtime in the application environment such as a CLR or a JVM. The best practice is to combine all essential systems, components and services that execute a defined set of business use cases into one AppDynamics application. However, the best way to organize business applications depends on the environment.
- Configure the pre-production and production AppDynamics applications identically in term of tiers, if at all possible. This allows for configuration items to be tested in pre-production and for smooth transitions to production.
Get further information on application monitoring from AppDynamics documentation.
Servers
With AppDynamics, a .NET application agent embedded machine agent is used to collect infrastructure metrics.
Another possibility for collecting infrastructure metrics is to use the standalone machine agent, which is a Java application running on the Windows host. An additional capability of the standalone machine agent is collecting additional data through the use of monitoring extensions (see the AppDynamics Community Exchange for a list of extensions and metric listeners), sending metrics collected via external programs through the machine agent into AppDynamics and running remediation scripts.
Further information can be found at the documentation links below.
- Standalone Machine Agent
- Server Visibility
- Extensions and Custom Metrics
- AppDynamics Extension Community Exchange
Remote services and databases
On the flowmap we sometimes see a large number of databases, but the application my only have one database. We can use the backend detection rules to group similar database calls into a single monitored backend. This allows us to baseline total performance across all calls.
Further information can be found at the AppDynamics documentation links below.
A remote service is an entity that resides external to the application server and provides a service to the application. In AppDynamics, un-instrumented databases and remote services are collectively known as backends.
AppDynamics discovers backends from exit point calls in the application code. An exit point is a call from an instrumented node to another node or to a backend. When the destination of an exit point call is not instrumented, the exit call results in a backend discovery event.
Backends (remote services) are important to monitor correctly. Most backend calls are identified out of the box based on known protocols, but where they are not automatically identified they can be configured to give optimal visibility into your applications. Types of backends that are often present in applications application are ADO.NET calls to SQL Server, JDBC calls to Oracle or MySQL databases, WCF or Web Service calls to unmonitored/uncorrelated services, HTTP calls to REST endpoints, calls to cache, calls to queues and file IO to various file locations.
AppDynamics automatically detects most of them but will need additional configuration for file IO to and from file shares.
A typical problem occurs if backends use dynamic names (as with temporary message queues). These dynamically named backends must be collapsed into a finite number of backends, typically one backend. Recognized backend calls include HTTP, Web Services, WCF, JDBC and Message Queue.
A best practice to reduce the number of remote services being monitored is to group like remote services. Critical remote services can be monitored as a single entity for greater visibility.
Further information can be found at documentation links below.
Errors
Error detection in AppDynamics comes in two flavors: HTTP-based error codes and Exception Monitoring.
An important aspect to consider is that AppDynamics detects errors when they cross monitored process boundaries. For example, if a call to backend resulted in an error, AppDynamics sees it and marks the BT as an error. In another example, if a call to function returned an exception, AppDynamics sees it and marks the BT as an error.
For web applications (web sites, WCF services, microservices, Node.js sites) that utilize HTTP protocol, the error codes from HTTP (404, 403, 500 and so on) indicate failures in your application and will result in BTs marked as an error and (if possible) error snapshots. They will be tracked as errors.
For applications that make external calls to SQL Server database, RAISEERROR statements, SQL timeouts or deadlocks will be raised as various types of SqlExceptions and will also result in the BT marked as an error, error snapshots and error tracking.
In some cases, the errors and exceptions thrown are benign and can be ignored. In the Error Detection configuration section, AppDynamics provides tools to ignore errors and exceptions by various criteria.
Additionally, if the application throws and catches an exception between the monitored entry and exit point, AppDynamics does not capture or report anything. In some instances, it can be beneficial to interpret those in-process errors as BT errors. Again, the Error Detection section of AppDynamics can mark invocation of any method into an "error" and capture information about it.
Error Detection in AppDynamics is typically used later in the process, once an application model (application, tier, BT, backend, data collector) is well established.
- Reviews errors captured for each application and configure AppDynamics to ignore those that do not indicate trouble.
- Add configuration to filter sensitive data from log messages and exception message text.
- Code level changes may need to be made in order to remediate some "errors."
Health rules
Health rules in AppDynamics can be used to monitor any metric for a violation of a threshold. They are used for status lights in dashboards and should be created in conjunction with their dashboard use. Health rules are also used to send notifications to support teams to alert them of potential performance problems.
A typical use case for health alerts is when monitoring business transaction performance. AppDynamics defines three standards for a BT that is performing poorly but has not terminated with an error:
- Slow
- Very slow
- Stalled
As soon as a transaction starts, the thread representing that transaction is monitored for a stall. Every five seconds, the running transactions are evaluated to determine if they have met or exceeded the stall threshold.
If a transaction execution finishes before the stall threshold value, the execution time is first compared against the very slow threshold, then the slow threshold and so on — and marked accordingly.
If a transaction hits the stall threshold (it takes more than 300 deviations above the average for the last 2 hours or the set stall threshold), a stall transaction event is generated. Whether the transaction eventually finishes successfully or times out, it is considered a stall for performance monitoring purposes.
The agent uses these standards to decide when to capture a snapshot for a transaction and when to start a diagnostic session. The transaction counts in each category also appear on the transaction scorecard portion of the application dashboard. AppDynamics provides three different ways to define thresholds for an agent to determine if a particular execution of a BT is slow or very slow:
- X% over average
- X standard deviations over
- A static threshold
For stalled transactions, only the standard deviation threshold is available.
Most AppDynamics customers prefer one of the first two options because they're dynamic: as performance improves, you don't have to keep adjusting a static threshold. In addition to thresholds, AppDynamics automatically calculates the baseline performance for your applications, that is, the prevailing performance characteristics of those applications. Once it establishes a baseline, it can detect anomalous conditions for your application.
Baselines are not available immediately upon startup. It takes time and application load for the AppDynamics platform to collect data and create its initial baselines. The time depends on the type of baseline being used, whether daily, weekly, monthly or none.
- Once business transactions are tuned, create health alerts monitoring BT response time and error rate.
- Create health rules monitoring performance.
- Use health rules to drive dashboard widgets to radiate application health information.
Anomaly detection
The AppDynamics Cognition Engine uses machine learning to enable anomaly detection to estimate expected ranges for business transaction metrics. When average response time or errors per minute deviate from the expected range, AppDynamics alerts on the anomaly. Once an anomaly event is triggered, AI capabilities in the Cognition Engine will show suspected causes of the anomaly leading to faster root cause analysis.
Anomaly detection has to be enabled for each application.
Further information can be found at AppDynamics Documentation links below.
User experience
Browser Real-User Monitoring (Browser RUM) allows customers to monitor web application performance from the point of view of a real or synthetic end user. A JavaScript file is injected into each instrumented page as close to the top as possible. The JavaScript collects data as the page loads, then bundles the data into a beacon and sends it to the EUM cloud.
If the backend is instrumented with AppDynamics, we will see correlation in snapshots that will help tie the complete end-to-end picture for driving dashboards and troubleshooting.
Further information can be found at AppDynamics documentation links below.
Pages and Ajax requests
Pages & Ajax requests provide detailed performance information on pages, Ajax requests, iFrames and virtual pages over time. In most default configurations, the limit for the number of Ajax requests and the limit for the number of pages/iFrames/virtual pages monitored is reached. This is normally due to not grouping similar pages or explosion in Ajax calls.
- Exclude Ajax requests or pages that do not need to be monitored.
- Group similar pages/Ajax together using Include Rules.
Further information can be found at AppDynamics documentation links below.
Synthetics
Browsers' synthetic monitoring uses geographically distributed synthetic agents to continuously test key user workflows for performance of the application. This allows for monitoring availability and performance that is independent of user-generated load.
- Look for meaningful metrics that identify problems and use health rules to drive dashboards and send notifications to support personnel.
Further information can be found at AppDynamics documentation links below.
Analytics
Analytics requires several milestones to be met before the data can be used. Applications must have well-defined business transactions and a good baseline of the data as well. Saved searches form the foundation for querying the data, and AppDynamics Query Language (ADQL) is the syntax to speak to the search engine.
Custom Metrics are another way of saving a query to be treated like a metric that is baselined and called once a minute. Business Journeys are a way to visually display the health and performance of long running business processes such as a loan approval or delivery process. Experience Levels are useful to show the difference between two sets of metrics.
Further information can be found at AppDynamics Documentation links below.
Dashboards
Dashboards provide a customized view of application performance data. Dashboards can aggregate and compare data from different applications, show live and historical data and aggregate data from core APM, analytics, database monitoring, real user monitoring and synthetics. They can be shared with support teams and business or product teams, or even users that do not have login permission to AppDynamics. Dashboards are the best way to radiate monitoring data throughout the organization.
Dashboards practices
Dashboards are used to radiate information to support and business teams displaying data relevant to their area of responsibility. Dashboards require tuned data that is trusted. Examples of possible dashboard types are:
- Executive Dashboards – Executive dashboards should include a number of status lights for each key user or system journey step health and performance, as well as any KPI or SLA metrics. These dashboards also show any business metrics or impact information.
- Operations Dashboards – Operations dashboards should include a number of status lights for each service (tier and/or Top BT) to provide a global overview of the health of the application infrastructure. These dashboards can also contain a Events List.
- Application Dashboards – Application dashboards target an application and show the health of key business transactions, highlighting call volumes, response times and error rates (for a given application or tier). It also uses status lights to indicate the health of each top business transaction.
- Service Dashboards – Service dashboards target an individual service and show key health metrics for the service (tier), including the call volume, response time and error rate for the service overall as well as for important API calls. It can also show CPU and memory consumption as well as garbage collection statistics for each node in the tier. A corresponding status light that provides a visual health indicator can accompany each metric.
Additional dashboard practices
- Minimize dashboard scrolling and use modular panel-based designs to allow for flexibility and expansion of the dashboard.
- Identify the dashboard with a title header including organization, application name, environment (Prod, PP), etc.
- An absolute layout is recommended for control over design.
- The auto-refresh feature of a dashboard should be no more than 120 seconds to keep it to near real-time as possible. Specific dashboard or widget time ranges can be used for data comparison.
- Use dashboard templates to help with user adoption and make it easier to produce.
- Utilize the report feature to extract data from dashboards to be sent to interested parties at scheduled intervals.
An example of some dashboards are as follows:
Agents
It is important to keep agents close to the latest versions. While the latest version might not be the best for production, a few version behind provides great benefits of newer features.
Automating agent install is an absolute must to ensure consistency between environments and instrumentation. In large environments, License Rules are helpful to ensure licenses are available to everyone that needs them and identifies applications that are requesting more.
Authentication
AppDynamics offers three primary ways to authenticate users:
- Local Authentication – Direct authentication of local users created and managed using AppDynamics.
- SAML – Authentication via a SAML token passed to AppDynamics from an external SAML provider.
- LDAP – Authentication using a corporate user directory.
SAML is the primary authentication mechanism to the AppDynamics UI. This is the recommended approach providing for simple and secure administration. AppDynamics Administrators manage mapping groups to roles, allowing users to be managed by the external authentication provider.
AppDynamics training
Take advantage of the content from AppDynamics University. When engaged with WWT, an active approach is used to encourage participation from the team. This provides an opportunity for hands-on work under the supervision of an AppDynamics consultant.
Recommended AppDynamics University courses for power users, administrators and analysts:
- Core APM I: Essentials – APM210
- Core APM II: Advanced – APM220
- Browser Real User Monitoring – EUM201
- Business Insights with Business iQ – ALY402-403
- Essential Administrator Functions I & II
Helpful resources: