d8888 888 888      88888888888 888      d8b                                 888       888          888       .d8888b.           888                               
      d88888 888 888          888     888      Y8P                                 888   o   888          888      d88P  Y88b          888                               
     d88P888 888 888          888     888                                          888  d8b  888          888      Y88b.               888                               
    d88P 888 888 888          888     88888b.  888 88888b.   .d88b.  .d8888b       888 d888b 888  .d88b.  88888b.   "Y888b.   88888b.  88888b.   .d88b.  888d888 .d88b.  
   d88P  888 888 888          888     888 "88b 888 888 "88b d88P"88b 88K           888d88888b888 d8P  Y8b 888 "88b     "Y88b. 888 "88b 888 "88b d8P  Y8b 888P"  d8P  Y8b 
  d88P   888 888 888          888     888  888 888 888  888 888  888 "Y8888b.      88888P Y88888 88888888 888  888       "888 888  888 888  888 88888888 888    88888888 
 d8888888888 888 888          888     888  888 888 888  888 Y88b 888      X88      8888P   Y8888 Y8b.     888 d88P Y88b  d88P 888 d88P 888  888 Y8b.     888    Y8b.     
d88P     888 888 888          888     888  888 888 888  888  "Y88888  88888P'      888P     Y888  "Y8888  88888P"   "Y8888P"  88888P"  888  888  "Y8888  888     "Y8888  
                                                                 888                                                          888                                        
                                                            Y8b d88P                                                          888                                        
                                                             "Y88P"                                                           888   

All Things WebSphere

Concerns and issues relating to all versions of WebSphere Application Server

Monday, May 30, 2011

 

Reducing WebSphere Application Server I/O overhead

Best practices for reducing I/O overhead of WebSphere Application Server:

If you applications on WAS are I/O intensive i.e do a lot of logging then following the best practices below will help reduce the stress on the OS I/O sub-system. If you run into issues with incomplete logging or with truncated javacores then reducing the amount of information logged to the system may help reduce the severity of the problem.

1. Disable the WebSphere Application Server service log/activity log
The service log is more commonly known as the activity.log and is found in the /profiles//logs directory. There is only one activity.log for each node. WebSphere Application Server runtime events are logged to the activity.log. It is written in binary format, so it cannot be viewed in a text editor.

The main purpose of the activity.log is that it can be viewed with the Log Analyzer tool, is a graphical user interface that displays the events from the activity.log and uses a symptom database to analyze the events and diagnose problems. This service is not essential to WAS and very few system administrator make use of the activity / service log.

You can configure properties of the activity.log in the administrative console: 
1. Select Troubleshooting → Logs and Trace. 
2. Select the WebSphere Application Server process. 
3. Select IBM® Service Logs. Disable the activity.log.

Tuesday, May 24, 2011

 

Dynacache vs Memcached

This is  a series of posts comparing caching technologies. The first in this series is Dynacache vs Memcached.

Labels:


Thursday, May 19, 2011

 

WAS GenCon Garbage Collection Policy


In WebSphere Application Server v8 that will be available from June 17 the default garbage collection policy has been changed to gencon.
You may wonder why this policy is now the default  ?

■ GenCon is good for transactional workloads: when the transaction is done, most of the objects are thrown away
■ GenCon is good for interactive workloads: short GC pauses mean high responsiveness
■ Many applications fit one of these models
■ Many common Java idioms create short-lived helper objects:
– e.g. StringBuffer / StringWriter,
– e.g. Enumerator / Iterator
■ A smaller collection area means a smaller working set, improving cache utilization
■ The young generation collector compacts as it collects, reducing fragmentation
■ It also tends to relocate objects so that related objects move closer together
– For example: a String and its char[] array, or a HashTable$Entry and its key.
– This can bring significant cache benefits

Things to be aware of when using gencon
■ Tenured objects are rarely collected (by design!) -- this can lead to objects living longer than
expected. A common case of this is class unloading
 ■ Similar problems exist for finalization and reference objects that survive long enough to be
tenured
■ Newspace overhead: since the young generation is divided into semi-spaces, there is
always a small amount of unusable heap

Please look at this presentation on Generational Garbage Collection: Theory and Best Practices on the UK WebSphere Users Group http://www.websphereusergroup.org.uk/wug/files/presentations/31/Chris_Bailey_-_Generational_GC.pdf




Wednesday, May 18, 2011

 

WebSphere Application Server Top 10 Tuning Recommendations


This blog post is based on a talk given at IMPACT 2011 by WebSphere Chief Performance Architect  Surya Duggirala 
 
#10 - Properly Tune the Operating System
  • Operating System is consistently overlooked for functional tuning as well as performance tuning.
  • Understand the hardware infrastructure backing your OS. Processor counts, speed, shared/unshared, etc
  • ulimit values need to be set correctly. Main player here is the number of open file handles (ulimit –n). Other process size and memory ones may need to be set based on application
  • Make sure NICs are set to full duplex and correct speeds
  • Large pages need to be enabled to take advantage of –Xlp JDK parametes
  • If enabled by default check RAS settings on OS and tune them down
  • Configure TCP/IP timeouts correctly for your applications needs
  • Depending on the load being placed on the system look into advanced tuning techniques such as pinning WAS processes via RSET or TASKSET as well as pinning IRQ interrupts
WAS Throughput with processor pinning
#9 – Keep Application Logging to a Minimum 
  • Never should there be information outside of error cases being written to SystemOut.log
  • If using logging build your log messages only when needed
  • Good
    • if(loggingEnabled==true){ errorMsg = “This is a bad error” + “ “ + failingObject.printError();}
  • Bad 
    • errorMsg = “This is a bad error” + “ “ + failingObject.printError();
      If(loggingEnabled==true){ System.out.println(errorMsg); }
  • Keep error and log messages to the point and easy to debug
  • If using Apache Commons, Log4J, or other frameworks ensure performance on your system is as expected
  • Ensure if you must log information for audit purposes or other reasons that you are writing to a fast disk
#8 – Understand and Tune Infrastructure (databases & other interactive server systems)
  • WebSphere Application Server and the system it runs on is typically only one part of the datacenter infrastructure and it has a good deal of reliance on other areas performing properly.Think of your infrastructure as a plumbing system. Optimal drain performance only occurs when no pipes are clogged. 
  • On the WAS system itself you need to be vary aware of
    • What other WAS instances (JVMs) are doing and their CPU / IO profiles
    • How much memory other WAS instance (or other OS’s in a virtualized case) are using
    • Network utilization of other applications coexisting on the same hardware
  • In the supporting infrastructure
    • Varying Network Latency can drastically effect split cell topologies, cross site data replication and database query latency
      • Ensure network infrastructure is repeatable and robust
      • Don’t take for granted bandwidth or latency before going into production always test as labs vary
    • Firewalls can cause issues with data transfer latencies between systems
  • On the database system
    • Ensure that proper indexes and tuning is done for the applications request patterns
    • Ensure that the database supports the number of connected clients your WAS runtime will have
    • Understand the CPU load and impacts of other applications (batch, OLTP, etc all competing with your applications)
  • On the database system
    • Ensure that proper indexes and tuning is done for the applications request patterns
    • Ensure that the database supports the number of connected clients your WAS runtime will have
    • Understand the CPU load and impacts of other applications (batch, OLTP, etc all competing with your applications)
  • On other application server systems or interactive server systems
    • Ensure performance of connected applications is up for the load being requested of it by the WAS system
    • Verify that developers have coded specific handling mechanisms for when connected applications go down (You need to avoid storm drain scenarios)

#7 – Minimize HTTP Session Content
  • High performance data replication for application availability depends on correctly sized session data
    • Keep it under 1MB in all cases if possible
  • Only should be storing information critical to that users specific interaction with the server
  • If composite data is required build it progressively as the interaction occurs
    • Configure Session Replication in WAS to meet your needs
    • Use different configuration options (async vs. synch) to give you the availability your application needs without compromising response time.
    • Select the replication topology that works best for you (DB, M2M, M2M Server) 
    • Keep replication domains small and/or partition where possible
 #6 – Correctly Tune Thread Pools
  • Thread pools and their corresponding threads control all execution on the hardware threads.
  • Understand which thread pools your application uses and size all of them appropriately based on utilization you see in tuning exercises
    • Thread dumps, PMI metrics, etc will give you this data 
    • Thread Dump Memory Analyzer and Tivoli Performance viewer (TPV) will help in viewing this data.
  • Think of the thread pool as a queuing mechanism to throttle how many active requests you will have running at any one time in your application.
    • Apply the funnel based approach to sizing these pools
      • Example IHS (1000) -> WAS ( 50) -> WAS DB connection pool (30) -> 
      • Thread numbers above vary based on application characteristics
    • Since you can throttle active threads you can control concurrency through your codebase
  • Thread pools needs to be sized with the total number of hardware processor cores in mind
    • If sharing a hardware system with other WAS instances thread pools have to be tuned with that in mind.
    • You need to more than likely cut back on the number of threads active in the system to ensure good performance for all applications due to context switching at OS layer for each thread in the system
    • Sizing or restricting the max number of threads a application can have can sometimes be used to prevent rouge applications for impacting others.
  • Default sizes for WAS thread pools on v6.1 and above are actually a little to high for best performance
    • Two to one ratio (threads to cores) typically yields the best performance but this varies drastically between applications and access patterns
TPV & TDMA tool snapshots
 #5 –Tune JDBC Data Sources
  • Correct database connection pool tuning can yield significant gains in performance
  • This pool is highly contended in heavily multithreaded applications so ensuring significant available connections are in the pool leads to superior performance.
  • Monitor PMI metrics via TPV or others tools to watch for threads waiting on connections to the database as well as their wait time.
    • If threads are waiting increase the number of pooled connections in conjunction with your DBA OR decrease the number of active threads in the system
    • In some cases, a one-to-one mapping between DB connections and threads may be ideal
  • Frequently database deadlocks or bottlenecks first manifest themselves as a large number of threads from your thread pool waiting for connections
  • Always use the latest database driver for the database you are running as performance optimization in this space between versions are significant
  • Tune the Prepared Statement Cache Size for each JDBC data source
    • Can also be monitored via PMI/TPV to determine ideal value

#4 –Create Cells To Group Like Applications
  • Create Cells and Clusters of application servers with an express purpose that groups them in some manner
  • Large Cells (400-500-1000 members) for the most part while supported don’t make sense
  • Group applications that need to replicate data to each other or talk to each other via RMI, etc and create cells and clusters around those commonalities. 
  • Keeping cell size smaller leads to more efficient resource utilization due to less network traffic for configuration changes, DRS, HAManager, etc.
    • For example, core groups should be limited to no more than 40 to 50 instances
  • Smaller cells and logic grouping make migration forward to newer versions of products easier and more compartmentalized.
#3 – Ensure Uniform Configuration Across Like Servers
  • Uniform configuration of software parameters and even operating systems is a common stumbling block
  • Most times manifests itself as a single machine or process that is burning more CPU, Memory or garbage collecting more frequently
  • Easiest way to manage this is to have a “dump configuration” script that runs periodically
  • Store the scripts results off and after each configuration change or application upgrade track differences
  • Leverage the Visual Configuration Explorer (VCE) tool available within ISA
Visual Configuration Explorer (VCE)

#2 – Correctly Tune The JVM
  • Correctly tuning the JVM in most cases will get you nearly 80% of the possible max performance of your application  
  • The big area to focus on for JVM tuning is heap size
    • Monitor verbose:gc and target GCing no more than once every 10 seconds with a max GC pause of a second or less.
    • Incremental testing is required to get this area right running with expected customer load on the system
    • Only after you have the above boundary layers met for GC do you want to start to experiment with differing garbage collection policies
  • Beyond the Heap Size settings most other parameters are to extract out max possible performance OR ensure that the JVM cooperates nicely on the system it is running on with other JVMs 
  • The Garbage Collector Memory Visualizer is an excellent tool tool for diagnosing GC issues or refining JVM performance tuning.
    • Provided as a downloadable plug-in within the IBM Support Assistant
Garbage Collection Memory Visualizer (GCMV)

 #1 – Perform Proper Load Testing
  • Properly load testing your application is the most critical thing you can do to ensure a rock solid runtime in production.
  • Replicating your production environment isn’t always 100% necessary as most times you can get the same bang for your buck with a single representative machine in the environment
    • Calculate expected load across the cluster and divide down to single machine load
    • Drive load and perform the usual tuning loop to resolve the parameter set you need to tweak and tune.
    • Look at load on the database system, network, etc and extrapolate if it will support the full systems load and if not of if there are questions test
  • Performance testing needs to be representative of patterns that your application will actually be executing
  • Proper performance testing keeps track of and records key system level metrics as well as throughput metrics for reference later when changes to hardware or application are needed.
  • Always over stress your system.  Push the hardware and software to the max and find the breaking points. 
  • Only once you have done real world performance testing can you accurately size the complete set of hardware required to execute your application to meet your demand.

Additional Links
  1. WebSphere Application Server Performance site
  2. DeveloperWorks Article: Performance Tuning Case Study based on DayTrader - Step-by-step approach to tuning the application server based on a sample application
  3. WebSphere Application Server Sample Performance Tuning Script - Can be used to adjust common tuning parameters based on predefined templates or customized to support additional fine tuning. Now available within v7.0.0

Thursday, May 12, 2011

 

Dynacache cache entry priority questions

  
 Smaller the size of the cache and more tuned to your working set, the better performance you will get.  (Smaller footprint, better concurrency etc ... )

Q Should we still avoid timeouts as our invalidation policy if at all possible?
Absolutely. Explicitly invalidating the cache-entries brings down the cache size immediately and prevents LRU activity.

Q Our application's usage is currently assigning a fixed default "Priority" for each-and-every item placed in the Cache using the DynaCache API (using timeouts as our invalidation policy).
This is fairly normal.  Priorities are typically used when there are different classes of cached items some more important than others. Lower priority cache entries are considered first when it comes to making room in the cache.

Q Default priority for each Cache item.  Admin versus API... what wins?
Dynacache API priority overrides priority set on the admin console. Priority determines how many cycles in the clock algorithm  must pass before an unused entry is chosen as a victim. Each entry's clock starts with the default or value set programatically/cachespec,xml and is decremented each clock cycle.  A clock value of <= 0 implies  a victim candidate.  The default is 1.
I would keep the default priority of entries as 1 and only set higher priority for those entries that are more important than the rest.

Tuesday, May 10, 2011

 

Dynacache Questions and Answers on Data Replication Service (DRS) replication best practices

Questions on Dynacache best practices vis a vis replication ... 

Q We plan to use one or more replication domains that span a large number of servers with a number of object cache instances, and to use sharing policy PUSH.
Unfortunately PUSH does not scale. Go with NOT_SHARED.

Q We have observed that it takes a little more than one second for the replication to begin after a cache is being looked up the first time. We want to lookup all cache instances early at the server start up process, in order to fill the caches with existing data in the cluster, making it available to the application as soon as the data is requested.
In WAS v7 Dynacache has a JVM generic customer property called "com.ibm.ws.cache.CacheConfig.createCacheAtServerStartup" which when set to "true" will create cache instances automatically at server startup instead of the default on-demand behavior. This definitely should be set if deploying to WAS7. If on an earlier version of WAS look at  technote  http://www-01.ibm.com/support/docview.wss?rs=180&uid=swg21313480 .
See http://wasdynacache.blogspot.com/2010/04/dynacache-and-drs-replication-faq.html .

Q We are concerned about the push mechanism and whether the initial push of say 100-150 Mb existing data distributed over perhaps 30-40 cache instances into a new-started JVM could have (temporary) performance implications on one or multiple JVM:s that are up and running.
Your concern is warranted and that is why I said PUSH does not scale. When a new JVM starts up *ALL* the other JVMs send it cached data.  So the same cached data is sent by ALL the running JVMs to the new one.  If this is a lot of cache data, I have often seen the older JVMs that are pushing the data die due to OOMs. If you so desire you can completely disable bootstrap by calling DistributedMap.setDRSBootstrap(false). This will prevent any JVM from sending data to the upcoming JVM.

Q When data is pushed into a newborn cache, will all cached element come from the same JVM, or could some elements be pushed from JVM 1 and others from JVM 2
Unfortunately there is no workload distribution when sending data by the existing JVMs to the new one. We don't even optimize and have one JVM send the data, instead ALL the JVMs send all the cache data in that object cache instance to the new JVM.
Some customers write cache population scripts to populate the cache for each JVM after it comes up and configure the cache in the NOT_SHARED mode which only replicates invalidations to keep the cache consistent. Other option is to use PUSH with drs bootstrap disabled.

 

How to detect a WebSphere Java classloader memory leak via the Eclipse Memory Analyzer tool

A presentation on Class Loading and debugging Class Loader memory leaks in WebSphere Application Server by Ian is an excellent treatise on this topic. Short summary follows:
1.  Run "classloader explorer" query
2.  Sort classloaders by name - identify the WAS CompoundClassLoaders
3.  Examine each CompoundClassLoader object - according to WAS, those whose localClassPath field is set to the empty string should be eligible for collection.
4.  For each CompoundClassloader whose localClassPath is the empty, string, run "Classloader->Path to GC roots->Exclude all weak/soft/phantom etc. references."
5.  Review the reference chains for each classloader and analyse for possible culprits/owners.

Debugging WAS from Dumps: Diagnose more than leaks with Memory Analyzer and its IBM Extensions pleas is another great presentation to understand how to do JVM debugging 


For bonus you can also look at  On the Move? WebSphere JDK Migrations, Past, Present and Future... and Beyond


These links and presentations are from the WebSphere UK User Group
http://www.websphereusergroup.org.uk/wug/downloads/31/

 

WebSphere Application Server - Performance considerations for deploying very large Java Enterprise Edition (JEE) applications

Please consider the following when developing and deploying very large applications (> 30,000 classes) bundled in one ear file.

The cost of having a large number of classes depends strongly on details of how the classes are present in the application, and whether Java EE5 features are enabled, and whether scanning is left enabled on all of the classes.

1) If the classes are exposed in a web module archive "WEB-INF/classes" directory, that will cause very large slowdowns.  The converse is for the classes to be in a JAR, either in a simple utility JAR directly contained by the application archive (EAR file), or for the classes to be contained within an EJBJar, or for the classes to be contained in a JAR beneath a web module archive "WEB-INF/lib" directory.

2) If javaEE5 processing is enabled for the classes, either because the classes are present in a javaEE5 enabled EJBJar or WAR, that will cause large slowdowns. The converse is for the classes to be present in a location which will not be scanned, for example, if there is a "META-INF/application.xml" for the EAR, and if all of the modules use a pre-javaEE5 version, or if all of the modules are marked as metadata-complete.

3) Even if javaEE5 processing is enabled, if a large subset of the classes do not require annotation scanning, that can be disabled using special properties in the application and module "META-INF/MANIFEST.MF" manifest files.

4) If using WAS version 7, please upgrade to fixpack 7.0.0.13 or later.  We put in some optimizations in 7.0.0.13 to make the reading and writing to disk of applications much more efficient.  Large apps (like Commerce) that are over 100 MB in size deployed 30-40% faster.

5) Check if the application is EE5 (in the application.xml, look for the version= tag).  If so, check if there are any WAR files inside the EAR that contain numerous JARs under WEB-INF/lib.  If those are present they can increase deployment time significantly.  There is a tuning option available to help, and it's documented here PK87053: JAVA ENTERPRISE EDITION VERSION 5 APPLICATIONS TAKE A LONG TIME TO DEPLOY http://www-01.ibm.com/support/docview.wss?uid=swg1PK87053

When deploying the application, there will be an unavoidable cost of copying the application files during deployment, which will be proportional to the total size of the classes (but not proportional to the number of classes, unless these are exposed in a WEB-INF/classes directory).  If the average class size is, say, 3K, and if there are 30K classes, then that will put the application size at at least 90M, which will take time to expand and to copy into the deployment location.  However, that cost is only during deployment.

 Note of thanks to Thomas Bitonti and David W. Hare for providing this information

Archives

December 2006   September 2008   January 2009   February 2009   March 2009   September 2009   October 2009   November 2009   December 2009   January 2010   February 2010   March 2010   April 2010   October 2010   January 2011   February 2011   April 2011   May 2011   June 2011   July 2011   August 2011   September 2011   October 2011   November 2011   December 2011   January 2012   February 2012   March 2012   April 2012   May 2012   June 2012   July 2012   August 2012   September 2012   October 2012   November 2012   January 2013   May 2013   June 2013   July 2013   September 2013   October 2013   June 2014   August 2014  

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]