General information

What is MetaModel?

MetaModel is a Java based library for accessing, exploring and querying datastores. Datastores may be relational databases, flat files, spreadsheet documents or single file based databases such as MS Access and dBase.

MetaModel provides a uniform API for all these datastores. With the MetaModel API you can explore the structure of the datastore (ie. Schemas, Tables, Columns, Relationships) and you can query the datastore using a fluent Java API for querying in a manner that will feel natural to anyone who's ever tried SQL, the industry standard for database querying.


Why should I care about MetaModel?

Because MetaModel is the most versatile Java library for accessing different datastores. No other library provides the same ability to access just about any datastore and MetaModel even does this in a way that is intuitive, type-safe and supports advanced querying options.

How it works

Which data formats does MetaModel support?

Please refer to the database compliancy page.


Which types of datastores does MetaModel not support?

For now MetaModel is limited to a tabular view on data, so any datastore that cannot be accessed in such a way will not be supported.

The upside is however that by far most datastore types are in fact tabular or can be interpreted as tabular. To make a point of this you can even use MetaModel with XML-files as a datastore, even though XML-files are inherently hierarical and not tabular, but MetaModel provides an abstraction layer that allows you to access an XML files as if it was made up of schemas, tables and columns.


How do you support querying of non-queryable datastores like CSV files or spreadsheets?

MetaModel includes it's own query interpretation engine, called Query Postprocessor. If a datastore has it's own native query engine MetaModel will rewrite the query to fit to that engine. If the datastore does not have a query engine then MetaModel will find out what data it needs to execute the query and perform the data manipulation required by the query in memory.


How are NULL values interpreted by MetaModel?

MetaModel's end users are typically Java developers, so we interpret NULL in a similar way as Java does.

Why is this interesting? Because in ANSI SQL, the semantics around NULL values are different. In SQL the following statement is false, whereas it is true in Java:

NULL = NULL

The distinction becomes very important when dealing with queries that specify a filter where some field is not equal to some operand. For instance:

SELECT * FROM persons WHERE gender <> 'M'

Traditional SQL engines will not include records where gender is NULL in the result set of the above query. But MetaModel will, since we believe this is the most natural way of working with queries for developers!


What are the performance characteristics of MetaModel?

MetaModel was designed to keep memory down whenever possible, but not at the expense of providing a rich API for exploring and querying your datastores. Of course there are going to be a lot of added objects to express your queries as objects in stead of a single String but in the big picture this is never a problem. The same argument applies to the schema model - you could potentially build an application without storing metadata about your tables, columns etc. as Java objects but we believe the added expressiveness of the API far overshadows the added memory footprint. If you do however have datastores with tens of thousands of columns then it may begin to present a disadvantage of using MetaModel but we propose simply configuring your JVM with a larger heap space then, because given your datastore structure you're going to need it anyway :)

In terms of query execution speed we also do a lot to keep the memory consumption down. When possible results are streamed from their original datastore so buffer sizes will not outgrow the available memory. There are however some situations where it is not possible to apply a streaming approach. For example if you issue an ORDER BY statement or a GROUP BY statement on a datastore that does not support native querying (such as a CSV file), then MetaModel will have to load the full contents of the datastore into memory in order to process the query. So make sure you only do these kinds of queries when the contents is not expected to outgrow your total amount of memory (typical JVM configurations can hold several (tens of) thousands rows of data though, depending on the amount of columns).

Which third-party libraries are used for...

... CSV files

The MetaModel-csv module depends on the opencsv library for parsing of CSV files. The query and update layer is implemented using MetaModel's Query Postprocessor engine.


... Excel spreadsheets

The MetaModel-excel module depends on the Apache POI library for reading Excel spreadsheets. The query and update layer is implemented using MetaModel's Query Postprocessor engine.


... OpenOffice.org database files

The MetaModel-openoffice module depends on the Hsqldb 1.8 database which is already the underlying database used by OpenOffice.org. The query layer is implemented in the MetaModel-jdbc module as the database connection is established using JDBC.


... Microsoft Access database files

The MetaModel-access module depends on the Jackcess library for reading Microsoft Access database-files. The query layer is implemented using MetaModel's Query Postprocessor engine.


... dBase files

The MetaModel-dbase module depends on the xBaseJ library for reading dBase files. xBaseJ has been especially re-licensed and incorporated into MetaModel under the LGPL license (originally it is GPL licensed). The query layer is implemented using MetaModel's Query Postprocessor engine.


... CouchDB databases

The MetaModel-couchdb module depends on the ektorp library for connecting to CouchDB databases. The query and update layer is implemented using MetaModel's Query Postprocessor engine.


... MongoDB databases

The MetaModel-mongodb module depends on the MongoDB Java driver library for connecting to MongoDB databases. The query and update layer is implemented using MetaModel's Query Postprocessor engine.

How does MetaModel compare to ...

Hibernate and Java Persistence API (JPA)?

Hibernate and JPA are object-relational mappers (ORMs), which MetaModel is not. The main difference lies in the fact that with Hibernate or JPA you need to have a fixed domain model on which your application works. This is the case for most business applications and MetaModel is no contestant in this arena.

MetaModel on the other hand does not have a fixed domain model - it works on the metadata level where there are just arbitrary structures of data. You can also say that MetaModel's domain model is data. So MetaModel would be a good starting point if you are building an application that is datastore-agnostic or maybe even if you are building an ORM yourself.

Some distinguishing differences:

  • Hibernate/JPA only supports relational JDBC-databases. MetaModel supports JDBC-databases in addition to CSV files, Excel spreadsheets and much more.
  • A JPA-based domain model is bound at compile time, while MetaModel represents the model of the datastore at runtime.
  • MetaModel can be used to explore and query any given datastore whereas Hibernate/JPA provides a way to bridge the gap of concrete domain modelling and a concrete database.
  • MetaModel supports type-safe, runtime generated, queries and datastore-exploration whereas Hibernate/JPA advocates hardcoded string literal binding and querying.

Java Database Connectivity (JDBC)?

JDBC is the most common way of accessing a relational database in Java and thus it is heavily used by MetaModel when working with relational databases. Database vendors (and in some cases complimentary open source communities) typically provide a JDBC driver which allows for the connectivity to the database.

MetaModel uses and enhances JDBC in a variety of ways:

  • Although the JDBC specification is pretty straight forward, not all vendors implement it correctly or entirely. MetaModel encapsulates the differences of varying JDBC implementations.
  • While there is a metadata API in JDBC it is quite ackward to work with. It requires you to traverse ResultSets describing the database structure in stead of first-class metadata objects like MetaModel's Schema, Table, Column and Relationship objects.
  • JDBC querying is based on string literal queries, whereas MetaModel's Query API encapsulates this complexity in a type-safe query building API which eradicates errors and provides compile time safety.

Language Integrated Query (LINQ) for .NET?

MetaModel was greatly inspired by the ideas of LINQ for .NET. The two technologies share a lot of similarities, but also some substantial differences. LINQ supports querying as a part of the language, which we of course cannot introduce into Java as a third party library.

Like LINQ...

  • MetaModel can be used to query not only for relational databases but also other datastores.
  • MetaModel defines a DSL for querying (the Query Builder API) which aids the user to write correct and meaningful queries.
  • MetaModel is inspired by SQL to make querying feel natural to the developer.
  • MetaModel provides a compile time check for your queries.
  • MetaModel makes it possibble to compose and manipulate your query programmatically and in a type-safe manner.

Unlike LINQ...

  • MetaModel is a Java library.
  • MetaModel provides support for datastores such as CSV files, Excel spreadsheets and more.
  • MetaModel has a slightly different set of built-in functions and query clauses.
  • MetaModel is not a language feature, but simply a library.