JPA – the uniqueness dilemma & solution

The dilemma

JPA does not support non-primary (often referred to as business) keys. In simple words if you want your entity to have an Id (as you normally do) and an additional composite key – you’re on your own. There are numerous articles and discussions on the subject worth checking out (yep, each word is a different link).

Let’s see if we can find a solution…

Solution #1 – Drop the Id and use a composite key

To have an Id is standard for many reasons. Even if you think you don’t need the Id (yes you do) you’re being forced to deviate from the standard just because of the lack of a better solution – no way.

Solution #2 – Keep the Id as your primary key and define unique constraints

It will be hard to read (as you’ll have to annotate the entity class instead of the fields) but it doesn’t sound that bad. Until.. welcome to EDD, exception driven development. – Not as good as TDD. The JPA has no intelligent way of letting you know whether the object already exists (often referred to as mergeOrCreate) instead it just occasionally throws a ConstraintViolationException in your face.

Meaning your implementation will look something like:

  • Try persisting
  • Got an exception? It’s already in the DB, go and fetch it..
  • No exception? Good.

Don’t you dare calling this a solution user453265436 on StackOverflow!

Solution #3 – Keep the Id, keep the constraints and make your own mergeOrCreate

The idea is to check if the object with the given unique fields already exists before trying to insert it into the DB. Luckily for us the good people of the internet offer some pseudo and actual implementations. Now all you have to do is write hashCode and equals functions for all your entities to be able to compare with tarnsient objects. And write equal criteria as well to be able to compare with persisted ones.

Do I seriously have to code this?! I fill sick already. Best part of my day.

Oh and make sure you can test is somehow. Don’t forget to keep hashCode and equals in sync. Also never misuse hashCode as key. Keep in mind what happens with your hash code if you update a collection.

Best part of my week. And we’re nowhere near multithreading yet – oh man, we’re gonna have such a great time!

Do yourself a favour and do it in a genral way as much as posibble. You might want to think about an Interface for your entities to provide their uniqueness criteria and a common EntityFactory or superclass. For some more detailed insight I recommend reading this post in particular from the above ones.

The solution I came up with is available on GitHub.

Below are some additional issues worth addressing. In some cases they’re part of the implementation, in others I’ll explain why they’re not.

Entity relationships

Pssst.. hey! Listen, don’t even bother just put everything in one entity.

If somehow by the grace of the ORM god you didn’t face the issue of uniquness yet, chances are you’re about to when normalizing your schema. Think about building up your entity in-memory that has a ManyToOne connection (Person and City in my example). When persisting it (besides checking the uniqueness of the entity itself) you need to check whether any connected entities already exist.

Solution

You’ll have to implement DAOs for every entity with ManyToOne connection. You’ll need to query that One for each of your Many.

Thought I was exaggerating? Best week ever!

On a sidenote here’s one reason why I prefer Hibernate’s Session API over JPA’s EntityManager API: it updates your object to the persisted state (i.e. assings Id when persisted). It makes synchronizing objects easier.

Those who say the both APIs are the same are dirty liars. They also tend to prefer EM. Don’t listen to them!

PARALLELISM

Aka. just when you thought it shouldn’t get any worse. I don’t even say “it can’t get worse” anymore.

We have to stop here and clarify how the Persistence Context cache works before we make more harm than good. Despite the many articles and examples that indicate otherwise JPA’s EntityManager and Hibernate’s Session offer an extended Persistence Context which means the cache is not bound to transaction but lives across transactions made from the same EntityManager/Session.

This is where Session’s getCurrentSession() comes into play. Another huge advantage of the Session API over EntityManager is that this method allows the threadsafe use of the API.

All you need to do is set ‘hibernate.current_session_context_class’ in your persistence config to ‘thread’. If you forget about it though you’ll get a nice exception during runtime which is, well, not so very nice…

Aaaaand that’s all for the good news. As for the bad news…

As mentioned before when using constraints you will get an ConstraintViolationException when trying to insert an entry that already exists in the DB. This may very well happen when multithreading even if you check whether the objects exists before inserting via mergeOrCreate because another thread may have created it in the meantime.

This might sound unlikely but when working with big amount of data it will happen – and this is exactly the case when you’re relying on your cache the most.

Why is this a problem? The Session / EntityManager object will be invalidated upon an expection and therefore the cache is lost. Meaning parallel execution will likely make your application slower.

Solution

You can avoid the Session invalidation by using nested transactions, however netiher of the mentioned APIs support it. Some others do though (e.g. Spring). If single threading is not an option (e.g. you’re developing a distributed app) you can still use locks.

Batch processing

In some cases this is something that could give you the performance boost you need. It’s not that simple here.

Persisting bigger chunks of data in one Transaction will leave you with probably even more ConstraintViolationExceptions – invalidating the whole Transaction. So instead of persisting multiple entities at once you’ll lose even those that we already persisted within the Transaction and have to start over.

In a single threaded environment that is only the case if the same entries are present within the same chunk. We can actually eliminate this possibility so this might be a possible improvement.

THE final solution?

It’s simple. We kill the batman.

We need support for business keys across application and DB layers. We need a persistence provider that can handle if a business key already exists in the persistence context or the DB.

It’s that simple.