Sunday, August 18, 2013

Data Multi-tenancy

Multi-tenancy
The term multi-tenancy in general is applied to software development to indicate an architecture in which a single running instance of an application simultaneously serves multiple clients (tenants). This is highly common in SaaS solutions.

Data Multi-tenancy
Isolating information (data, customizations, etc) pertaining to the various tenants is a particular challenge in these systems. This includes the data owned by each tenant stored in the database. It is this last piece, sometimes called multi-tenant data, on which we will focus.

Multi-tenant data approaches
There are 3 main approaches to isolating information in these multi-tenant systems which goes hand-in-hand with different database schema definitions and JDBC setups. Each approach has pros and cons as well as specific techniques and considerations. Multi-TenantData Architecture does a great job of covering these topics.

Separate database
Each tenant's data is kept in a physically separate database instance. JDBC Connections would point specifically to each database, so any pooling would be per-tenant. A general application approach here would be to define a JDBC Connection pool per-tenant and to select the pool to use based on the“tenant identifier” associated with the currently logged in user.

Separate schema
Each tenant's data is kept in a distinct database schema on a single database instance. There are 2 different ways to define JDBC Connections here:

  • Connections could point specifically to each schema, as we saw with the Separate databaseapproach. This is an option provided that the driver supports naming the default schema in the connection URL or if the pooling mechanism supports naming a schema to use for its Connections. Using this approach, we would have a distinct JDBC Connection pool per-tenant where the pool to use would be selected based on the “tenant identifier” associated with the currently logged in user.
  • Connections could point to the database itself (using some default schema) but the Connections would be altered using the SQL SET SCHEMA (or similar) command. Using this approach, we would have a single JDBC Connection pool for use to service all tenants, but before using the Connection it would be altered to reference the schema named by the “tenant identifier” associated with the currently logged in user.

Partitioned (discriminator) data
All data is kept in a single database schema. The data for each tenant is partitioned by the use of partition value or discriminator. The complexity of this discriminator might range from a simple column value to a complex SQL formula. Again, this approach would use a single Connection pool to service all tenants. However, in this approach the application needs to alter each and every SQL statement sent to the database to reference the “tenant identifier” discriminator.

Data Multi-tenancy in Hibernate
Using Hibernate with multi-tenant data comes down to both an API and then integration piece(s). As usual Hibernate strives to keep the API simple and isolated from any underlying integration complexities. The API is really just defined by passing the tenant identifier as part of opening any session.

Example 16.1. Specifying tenant identifier from SessionFactory
Session session = sessionFactory.withOptions()
        .tenantIdentifier( yourTenantIdentifier )
        ...
        .openSession();

Additionally, when specifying configuration, a org.hibernate.MultiTenancyStrategy should be named using the hibernate.multiTenancy setting. Hibernate will perform validations based on the type of strategy you specify. The strategy here correlates to the isolation approach discussed above.
NONE

(the default) No multi-tenancy is expected. In fact, it is considered an error if a tenant identifier is specified when opening a session using this strategy
            
 SCHEMA
Correlates to the separate schema approach. It is an error to attempt to open a session without a tenant identifier using this strategy. Additionally, aorg.hibernate.service.jdbc.connections.spi.MultiTenantConnectionProvider must be specified.
DATABASE
Correlates to the separate database approach. It is an error to attempt to open a session without a tenant identifier using this strategy. Additionally, aorg.hibernate.service.jdbc.connections.spi.MultiTenantConnectionProvider must be specified.
DISCRIMINATOR
Correlates to the partitioned (discriminator) approach. It is an error to attempt to open a session without a tenant identifier using this strategy. This strategy is not yet implemented in Hibernate as of 4.0 and 4.1. Its support is planned for 5.0.

 Google doc link

Sunday, August 11, 2013

Spring JDBCTemplate Vs Hibernate Performance + JPA & Caching

Spring JDBCTemplate Vs Hibernate Performance

If you do all you can to make Spring JDBCTemplate / Hibernate implementations very fast, the JDBC template will probably be a bit faster, because it doesn't have the overhead that Hibernate has, but it will probably take much more time and lines of code to implement.

Hibernate has its learning curve, and you have to understand what happens behind the scenes, when to use projections instead of returning entities, etc. But if you master it, you'll gain much time and have cleaner and simpler code than with a JDBC-based solution.

In 95% of the cases, Hibernate is fast enough, or even faster than non-optimized JDBC code. For the 5% left, nothing forbids you to use something else, like Spring-JDBC for example. Both solutions are not mutually exclusive.

Hibernate

Hibernate is not intended for batch-jobs, it is an anti patterns of O/R Mapper usage.  Even the Hibernate documentation warns to not do that. Batch processing often can be much more easily done by a single clever SQL statement instead via O/R mappers which need a bunch of statements to do the same. Not because they are bad, but because they have not been built for this use case. Data is always by default loaded lazyly in Hibernate.

One of the common problems of people that start using Hibernate is performance, if you don’t have much experience in Hibernate you will find how quickly your application becomes slow. If you enable sql traces, you would see how many queries are sent to database that can be avoided with little Hibernate knowledge. Good use of Hibernate Cache to avoid amount of traffic between your application and database.

JPA + Spring

If you are using spring and JPA, it is very likely that you utilize ehcache (or another cache provider). And you do that in two separate scenarios: JPA 2nd level cache and spring method caching.

When you configure your application, you normally set the 2nd level cache provider of your JPA provider (hibernate, in my case) and you also configure spring with the “cache” namespace. Everything looks OK and you continue with the project. But there’s a caveat. If you follow the most straightforward way, you get two separate cache managers which load the same cache configuration file. This is not bad per-se, but it is something to think about – do you really need two cache manager and the problems that may arise from this

JPA & Caching

Caching is the most important performance optimization technique. There are many things that can be cached in persistence, objects, data, database connections, database statements, query results, meta-data, relationships, to name a few. Caching in object persistence normally refers to the caching of objects or their data.

Caching also influences object identity, that is that if you read an object, then read the same object again you should get the identical object back (same reference).

JPA 1.0 does not define a shared object cache, JPA providers can support a shared object cache or not, however most do.

Caching in JPA is required with-in a transaction or within an extended persistence context to preserve object identity, but JPA does not require that caching be supported across transactions or persistence contexts.

JPA 2.0 defines the concept of a shared cache. The @Cacheable annotation or cacheable XML attribute can be used to enable or disable caching on a class.

Hibernate cache

Cache Types : Hibernate uses different types of caches. Each type of cache is used for different purposes. Let us first have a look at this cache types.

The first cache type is the session cache. The session cache caches object within the current session.
The second cache type is the query Cache. The query cache is responsible for caching queries and their results.
The third cache type is the second level cache. The second level cache is responsible for caching objects across sessions.

The Session Cache
session cache caches values within the current session.  This cache is enabled by default.

The Query Cache
The query cache can be really usefull to optimize the performance of your data access layer. However there are a number of pitfalls as well.  This blog post describes a serious problem regarding memory consumption of the Hibernate query cache when using objects as parameters. It could consumpt ton of memories and hit
OutOfMemoryError (How to fix it)

The Second Level Cache
The key characteristic of the second-level cache is that is is used across sessions, which also differentiates it from the session cache, which only – as the name says – has session scope. Hibernate provides a flexible concept to exchange cache providers for the second-level cache. By default Ehcache is used as caching provider.

However more sophisticated caching implementation can be used like the distributed JBoss Cache or Oracle Coherence.

Additional readings:

Understanding caching in hibernate (1 | 2 | 3) , tutorial and tuning

Top 10 Performance Problems taken from Zappos, Monster, Thomson and Co

52 weeks of Application Performance – The dynaTrace Almanac







´