Sunday, August 18, 2013

Data Multi-tenancy

Multi-tenancy
The term multi-tenancy in general is applied to software development to indicate an architecture in which a single running instance of an application simultaneously serves multiple clients (tenants). This is highly common in SaaS solutions.

Data Multi-tenancy
Isolating information (data, customizations, etc) pertaining to the various tenants is a particular challenge in these systems. This includes the data owned by each tenant stored in the database. It is this last piece, sometimes called multi-tenant data, on which we will focus.

Multi-tenant data approaches
There are 3 main approaches to isolating information in these multi-tenant systems which goes hand-in-hand with different database schema definitions and JDBC setups. Each approach has pros and cons as well as specific techniques and considerations. Multi-TenantData Architecture does a great job of covering these topics.

Separate database
Each tenant's data is kept in a physically separate database instance. JDBC Connections would point specifically to each database, so any pooling would be per-tenant. A general application approach here would be to define a JDBC Connection pool per-tenant and to select the pool to use based on the“tenant identifier” associated with the currently logged in user.

Separate schema
Each tenant's data is kept in a distinct database schema on a single database instance. There are 2 different ways to define JDBC Connections here:

  • Connections could point specifically to each schema, as we saw with the Separate databaseapproach. This is an option provided that the driver supports naming the default schema in the connection URL or if the pooling mechanism supports naming a schema to use for its Connections. Using this approach, we would have a distinct JDBC Connection pool per-tenant where the pool to use would be selected based on the “tenant identifier” associated with the currently logged in user.
  • Connections could point to the database itself (using some default schema) but the Connections would be altered using the SQL SET SCHEMA (or similar) command. Using this approach, we would have a single JDBC Connection pool for use to service all tenants, but before using the Connection it would be altered to reference the schema named by the “tenant identifier” associated with the currently logged in user.

Partitioned (discriminator) data
All data is kept in a single database schema. The data for each tenant is partitioned by the use of partition value or discriminator. The complexity of this discriminator might range from a simple column value to a complex SQL formula. Again, this approach would use a single Connection pool to service all tenants. However, in this approach the application needs to alter each and every SQL statement sent to the database to reference the “tenant identifier” discriminator.

Data Multi-tenancy in Hibernate
Using Hibernate with multi-tenant data comes down to both an API and then integration piece(s). As usual Hibernate strives to keep the API simple and isolated from any underlying integration complexities. The API is really just defined by passing the tenant identifier as part of opening any session.

Example 16.1. Specifying tenant identifier from SessionFactory
Session session = sessionFactory.withOptions()
        .tenantIdentifier( yourTenantIdentifier )
        ...
        .openSession();

Additionally, when specifying configuration, a org.hibernate.MultiTenancyStrategy should be named using the hibernate.multiTenancy setting. Hibernate will perform validations based on the type of strategy you specify. The strategy here correlates to the isolation approach discussed above.
NONE

(the default) No multi-tenancy is expected. In fact, it is considered an error if a tenant identifier is specified when opening a session using this strategy
            
 SCHEMA
Correlates to the separate schema approach. It is an error to attempt to open a session without a tenant identifier using this strategy. Additionally, aorg.hibernate.service.jdbc.connections.spi.MultiTenantConnectionProvider must be specified.
DATABASE
Correlates to the separate database approach. It is an error to attempt to open a session without a tenant identifier using this strategy. Additionally, aorg.hibernate.service.jdbc.connections.spi.MultiTenantConnectionProvider must be specified.
DISCRIMINATOR
Correlates to the partitioned (discriminator) approach. It is an error to attempt to open a session without a tenant identifier using this strategy. This strategy is not yet implemented in Hibernate as of 4.0 and 4.1. Its support is planned for 5.0.

 Google doc link

Sunday, August 11, 2013

Spring JDBCTemplate Vs Hibernate Performance + JPA & Caching

Spring JDBCTemplate Vs Hibernate Performance

If you do all you can to make Spring JDBCTemplate / Hibernate implementations very fast, the JDBC template will probably be a bit faster, because it doesn't have the overhead that Hibernate has, but it will probably take much more time and lines of code to implement.

Hibernate has its learning curve, and you have to understand what happens behind the scenes, when to use projections instead of returning entities, etc. But if you master it, you'll gain much time and have cleaner and simpler code than with a JDBC-based solution.

In 95% of the cases, Hibernate is fast enough, or even faster than non-optimized JDBC code. For the 5% left, nothing forbids you to use something else, like Spring-JDBC for example. Both solutions are not mutually exclusive.

Hibernate

Hibernate is not intended for batch-jobs, it is an anti patterns of O/R Mapper usage.  Even the Hibernate documentation warns to not do that. Batch processing often can be much more easily done by a single clever SQL statement instead via O/R mappers which need a bunch of statements to do the same. Not because they are bad, but because they have not been built for this use case. Data is always by default loaded lazyly in Hibernate.

One of the common problems of people that start using Hibernate is performance, if you don’t have much experience in Hibernate you will find how quickly your application becomes slow. If you enable sql traces, you would see how many queries are sent to database that can be avoided with little Hibernate knowledge. Good use of Hibernate Cache to avoid amount of traffic between your application and database.

JPA + Spring

If you are using spring and JPA, it is very likely that you utilize ehcache (or another cache provider). And you do that in two separate scenarios: JPA 2nd level cache and spring method caching.

When you configure your application, you normally set the 2nd level cache provider of your JPA provider (hibernate, in my case) and you also configure spring with the “cache” namespace. Everything looks OK and you continue with the project. But there’s a caveat. If you follow the most straightforward way, you get two separate cache managers which load the same cache configuration file. This is not bad per-se, but it is something to think about – do you really need two cache manager and the problems that may arise from this

JPA & Caching

Caching is the most important performance optimization technique. There are many things that can be cached in persistence, objects, data, database connections, database statements, query results, meta-data, relationships, to name a few. Caching in object persistence normally refers to the caching of objects or their data.

Caching also influences object identity, that is that if you read an object, then read the same object again you should get the identical object back (same reference).

JPA 1.0 does not define a shared object cache, JPA providers can support a shared object cache or not, however most do.

Caching in JPA is required with-in a transaction or within an extended persistence context to preserve object identity, but JPA does not require that caching be supported across transactions or persistence contexts.

JPA 2.0 defines the concept of a shared cache. The @Cacheable annotation or cacheable XML attribute can be used to enable or disable caching on a class.

Hibernate cache

Cache Types : Hibernate uses different types of caches. Each type of cache is used for different purposes. Let us first have a look at this cache types.

The first cache type is the session cache. The session cache caches object within the current session.
The second cache type is the query Cache. The query cache is responsible for caching queries and their results.
The third cache type is the second level cache. The second level cache is responsible for caching objects across sessions.

The Session Cache
session cache caches values within the current session.  This cache is enabled by default.

The Query Cache
The query cache can be really usefull to optimize the performance of your data access layer. However there are a number of pitfalls as well.  This blog post describes a serious problem regarding memory consumption of the Hibernate query cache when using objects as parameters. It could consumpt ton of memories and hit
OutOfMemoryError (How to fix it)

The Second Level Cache
The key characteristic of the second-level cache is that is is used across sessions, which also differentiates it from the session cache, which only – as the name says – has session scope. Hibernate provides a flexible concept to exchange cache providers for the second-level cache. By default Ehcache is used as caching provider.

However more sophisticated caching implementation can be used like the distributed JBoss Cache or Oracle Coherence.

Additional readings:

Understanding caching in hibernate (1 | 2 | 3) , tutorial and tuning

Top 10 Performance Problems taken from Zappos, Monster, Thomson and Co

52 weeks of Application Performance – The dynaTrace Almanac







Sunday, June 30, 2013

High Availability (HA), Disaster Recovery (DR) & HADR Solution

High availability (HA), answers the question 'what do I do in case a single machine fails?' -  means a machine that can immediately take over in case of a problem with the main machine with little down time, and no loss of data.

HA is the measurement of a system’s ability to remain accessible in the event of a system component failure. Generally, HA is implemented by building in multiple levels of fault tolerance and/or load balancing capabilities into a system

Disaster Recovery (DR), on the other hand, answers 'what do I do in case a disaster happens (fire, floods, war, ISP goes bankrupt, whatever) to the whole data center?'. it is something intended to take over in the event of a  disaster at the main site.

DR is the process by which a system is restored to a previous acceptable state, after a natural or man-made disaster.

While both increase overall availability, a notable difference is that with HA there is, generally, no loss of service. HA refers to the retaining of the service and DR to the retaining of the data. Whereas, with DR there is usually a slight loss of service while the DR plan is executed and the system is restored. HA and DR strategies should strive to address any non-functional requirements, such as performance, system availability, fault tolerance, data retention, business continuity, and user experience. It is imperative that selection of the appropriate HA and DR strategy be driven by business requirements. For HA, determine any service level agreements expected of your system. For DR, use measurable characteristics, such as Recovery Time Objective (RTO) and Recovery Point Objective (RPO), to drive your DR plan.

The following requirements are the most common IT considerations for establishing an HADR solution:

Recovery time objective (RTO)
The time as measured from the time of application unavailability to the time of recovery (resuming business operations).

Recovery point objective (RPO)
The last data point to which production is recovered upon a failure. Ideally, customers want the RPO to be zero lost data. Practically speaking, we tend to accept a recovery point associated with a particular application state.

A comprehensive end to end HADR solution has the following basic components:

- Application data resiliency
Data resiliency is the base or foundational element for a high availability and disaster recovery solution deployment. Methods and characteristics : 

Storage-based resiliency: Storage replication is the most commonly used technique for deploying cluster-wide data resiliency. There are two general categories for storage-based resiliency: shared-disk topology and shared-everything topology.

Log-based replication: Log-based replication is a form of resiliency primarily associated with databases. Typically, database logs are used to monitor changes that are then replicated to a second system where those changes are applied. 

- Application infrastructure resiliency
Infrastructure resiliency provides the overall environment that is required to resume full production at a standby node. This environment includes the entire list of resources that the application requires upon failover for the operations to resume automatically. Methods and characteristics:

Application infrastructure resiliency has two aspects. First, it provides the application with all the resources that it requires to resume operations at an alternate node in the cluster. Second, it provides for cluster integrity by using monitoring and verification. These resources include items such as dependent hardware, middleware, IP connectivity, configuration files, attached devices (printers), security profiles, application specific custom resources (crypto card) and the application data itself.

- Application state resiliency
Application state resiliency is characterized by the application recovery point as described when the production environment resumes on a secondary node in the cluster. Characteristic of the application to resume varies by application design and customer requirements.

Where will the application recovery point be with respect to the last application transaction? If your application is designed with commit boundaries and the outage is an unplanned failover, then the recovery point in the application will be to that last commit boundary. If you are conducting a planned outage role swap, then the application is quiesced so that memory can be flushed to the shared-disk resource and the data and application are subsequently varied on to the secondary node

A complete end-to-end solution incorporates all three elements into one integrated environment that addresses one or all of the outage.

A solution to a customer depends upon the inclusion and incorporation of these basic elements into the clustering configuration. For example, you can have a solution based purely upon data resiliency and leave the application resiliency aspects of the final recovery process to IT operational procedures. Alternatively you can incorporate the data resiliency into the overall clustering topology enabling automated recovery processing.

References:

Thursday, June 20, 2013

Virtualization Computing

There are several kinds of virtualization techniques which provide similar features but differ in the degree of abstraction and the methods used for virtualization.

Full virtualazation (VMs)

In computer science, full virtualization is a virtualization technique used to provide a certain kind of virtual machine environment, namely, one that is a complete simulation of the underlying hardware. Full virtualization requires that every salient feature of the hardware be reflected into one of several virtual machines – including the full instruction set, input/output operations, interrupts, memory access, and whatever other elements are used by the software that runs on the bare machine, and that is intended to run in a virtual machine.

Virtual machines emulate some real or fictional hardware, which in turn requires real resources from the host (the machine running the VMs). This approach, used by most system emulators, allows the emulator to run an arbitrary guest operating system without modifications because guest OS is not aware that it is not running on real hardware. The main issue with this approach is that some CPU instructions require additional privileges and may not be executed in user space thus requiring a virtual machines monitor (VMM), also called a hypervisor, to analyze executed code and make it safe on-the-fly. Hardware emulation approach is used by VMware products, VirtualBox, QEMU, Parallels and Microsoft Virtual Server.

Hypervisor
In computing, a hypervisor, also called virtual machine manager (VMM), is one of many hardware virtualization techniques allowing multiple operating systems, termed guests, to run concurrently on a host computer. It is so named because it is conceptually one level higher than a supervisory program. The hypervisor presents to the guest operating systems a virtual operating platform and manages the execution of the guest operating systems. Multiple instances of a variety of operating systems may share the virtualized hardware resources. Hypervisors are very commonly installed on server hardware, with the function of running guest operating systems, that themselves act as servers.

- Type 1 (or native, bare metal) hypervisors run directly on the host's hardware to control the hardware and to manage guest operating systems. A guest operating system thus runs on another level above the hypervisor.
This model represents the classic implementation of virtual machine architectures; the original hypervisors were the test tool, SIMMON, and CP/CMS, both developed at IBM in the 1960s. CP/CMS was the ancestor of IBM's z/VM. Modern equivalents of this are Oracle VM Server for SPARC, the Citrix XenServer, KVM, VMware ESX/ESXi, and Microsoft Hyper-V hypervisor.
- Type 2 (or hosted) hypervisors run within a conventional operating system environment. With the hypervisor layer as a distinct second software level, guest operating systems run at the third level above the hardware. VMware Workstation and VirtualBox are examples of Type 2 hypervisors.

In other words, Type 1 hypervisor runs directly on the hardware; a Type 2 hypervisor runs on another operating system, such as FreeBSD,[4] Linux[5] or Windows.[5] (http://en.wikipedia.org/wiki/Hypervisor)

Paravirtualization
This technique also requires a VMM, but most of its work is performed in the guest OS code, which in turn is modified to support this VMM and avoid unnecessary use of privileged instructions. The paravirtualization technique also enables running different OSs on a single server, but requires them to be ported, i.e. they should «know» they are running under the hypervisor. The paravirtualization approach is used by products such as Xen and UML.

Virtualization on the OS level, a.k.a. containers virtualization
Most applications running on a server can easily share a machine with others, if they could be isolated and secured. Further, in most situations, different operating systems are not required on the same server, merely multiple instances of a single operating system. OS-level virtualization systems have been designed to provide the required isolation and security to run multiple applications or copies of the same OS (but different distributions of the OS) on the same server. OpenVZ, Virtuozzo, Linux-VServer, Solaris Zones and FreeBSD Jails are examples of OS-level virtualization.

The three techniques differ in complexity of implementation, breadth of OS support, performance in comparison with standalone server, and level of access to common resources. For example, VMs have wider scope of usage, but poorer performance. Para-VMs have better performance, but can support fewer OSs because one has to modify the original OS.

Virtualization on the OS level provides the best performance and scalability compared to other approaches. Performance difference of such systems can be as low as 1…3%, comparing with that of a standalone server. Virtual Environments are usually also much simpler to administer as all of them can be accessed and administered from the host system. Generally, such systems are the best choice for server consolidation of same OS workloads.

HP CloudSystem

HP CloudSystem is a cloud infrastructure from Hewlett-Packard (HP) that combines storage, servers, networking and software for organizations to build complete private, public and hybrid Cloud computing environments.

HP CloudSystem is an integrated infrastructure used to create, manage and consume cloud-based services. The cloud types supported are private, public and hybrid. HP has claimed that user organizations can build and deploy new cloud services using HP CloudSystem in minutes.

Comparison of platform virtual machines

VMware ESX

VMware ESX is an enterprise-level computer virtualization product offered by VMware, Inc. ESX is a component of VMware's larger offering, VMware Infrastructure, and adds management and reliability services to the core server product. VMware is replacing the original ESX with ESXi.

VMware ESX and VMware ESXi are bare metal embedded hypervisors that are VMware's enterprise software hypervisors for guest virtual servers that run directly on host server hardware without requiring an additional underlying operating system.

Monday, June 17, 2013

How to Analyze Java Thread / Heap Dumps

The content of this article was originally written by Tae Jin Gu on the Cubrid blog.

When there is an obstacle, or when a Java based Web application is running much slower than expected, we need to use thread dumps. If thread dumps feel like very complicated to you, this article may help you very much. Here I will explain what threads are in Java, their types, how they are created, how to manage them, how you can dump threads from a running application, and finally how you can analyze them and determine the bottleneck or blocking threads. This article is a result of long experience in Java application debugging.

Java and Thread

A web server uses tens to hundreds of threads to process a large number of concurrent users. If two or more threads utilize the same resources, a contention between the threads is inevitable, and sometimes deadlock occurs.

Thread contention is a status in which one thread is waiting for a lock, held by another thread, to be lifted. Different threads frequently access shared resources on a web application. For example, to record a log, the thread trying to record the log must obtain a lock and access the shared resources.

Deadlock is a special type of thread contention, in which two or more threads are waiting for the other threads to complete their tasks in order to complete their own tasks.

Different issues can arise from thread contention. To analyze such issues, you need to use the thread dump. A thread dump will give you the information on the exact status of each thread.

Background Information for Java Threads

Thread Synchronization

A thread can be processed with other threads at the same time. In order to ensure compatibility when multiple threads are trying to use shared resources, one thread at a time should be allowed to access the shared resources by using thread synchronization.
Thread synchronization on Java can be done using monitor. Every Java object has a single monitor. The monitor can be owned by only one thread. For a thread to own a monitor that is owned by a different thread, it needs to wait in the wait queue until the other thread releases its monitor.

Thread Status

In order to analyze a thread dump, you need to know the status of threads. The statuses of threads are stated on java.lang.Thread.State.
  Thread Status.
Figure 1: Thread Status.
  • NEW: The thread is created but has not been processed yet.
  • RUNNABLE: The thread is occupying the CPU and processing a task. (It may be in WAITING status due to the OS's resource distribution.)
  • BLOCKED: The thread is waiting for a different thread to release its lock in order to get the monitor lock.
  • WAITING: The thread is waiting by using a wait, join or park method.
  • TIMED_WAITING: The thread is waiting by using a sleep, wait, join or park method. (The difference from WAITING is that the maximum waiting time is specified by the method parameter, and WAITING can be relieved by time as well as external changes.) 

Thread Types

Java threads can be divided into two:
  1. daemon threads;
  2. and non-daemon threads.
Daemon threads stop working when there are no other non-daemon threads. Even if you do not create any threads, the Java application will create several threads by default. Most of them are daemon threads, mainly for processing tasks such as garbage collection or JMX.
A thread running the 'static void main(String[] args)’ method is created as a non-daemon thread, and when this thread stops working, all other daemon threads will stop as well. (The thread running this main method is called the VM thread in HotSpot VM.)

Read original post from here

Commands to create thread dump on solaris:

[user-account@localhost]$ ps -ef | grep java
To list all running java pid 
[user-account@localhost]$ pargs pid
To determine which pid is for dump 
[user-account@localhost]$ cd %JAVA_HOME%/bin
Go to java binary folder which having jstack command
[user-account@localhost]$ ./jstack -l pid /var/tmp/thread-dump-pid.out
Create the running thread dump

Solaris CMD QRC From Internet

My Google link

Know more about Java Heap Dump Analysis

How to analyze heap dumps

Heap Dump Analysis with Memory Analyzer (1 | 2

Eclipse Memory Analyzer

´