Apache Jackrabbit Tutorial for Beginners

apache jackrabbit

What is Apache jackrabbit?          

Apache Jackrabbit is a platform of java open source content repository. A JCR (Java content repository) is a type of object database to customizing, storing, searching and retrieving hierarchical data. It was started on August 28, 2004, and developed by Apache Software Foundation. Jackrabbit is entirely written in Java & uses a cross-platform operating system. It is structured content, and it uses SQL queries. It provides full-text indexing ( Excel, Word, PDF), etc.

Currently, we are working in Apache Jackrabbit version 2.16.1, which has been released in Feb9, 2018. It implements JSR-170 & JSR-283.

A content-based repository is a hierarchical content store which assembles for structured and unstructured content, full-text search, versioning transactions observation, and more.

JSR-170 specifies a Level 1, a Level 2 and a set of advanced repository feature blocks. Jackrabbit is fully JSR-170 compliant and therefore supports Level 1, Level 2, and all the optional blocks.

apache jackrabbit 2

Difference between JCR and Database

JCR (Java content repository)     Database
A JCR uses a database to store things.A database stores things.
A JCR enforces an API to access its data and database.A database does not implement an API to access its data.
A JCR is built on a set of standards.A database can utilize some standard, but implementations vary and deviate.
A JCR stores content.A database stores data.

Workspaces

  • There can be more than one repository in a JCR.                                                                                                                     
  • Each has one and only one root node.                                                                                                                  
  • Name of default workspace is "default."                                                                                                                
  • Creating a new workspace is commonly administrative and requires shutdown and new configuration file.                                                                                                                                                        

Nodes

Nodes are the actual items we think of a tree.                                                                                                            

Every workspace starts with a reset node.

Each node has 0 or more children.

Each may have 0 or more properties.

All nodes have exactly one parent.

All nodes have a name (through its bitrate.)

More node stuff

Each node exactly one primary type and a node have many mixings (behaviors).     

Node types                                                                                                                                       

Standard node types (nt: file, nt: hierarchy Node, etc)                                                                External Node Types (sling: folder, cq: page).

We can make our node type; there is an API for that.

Properties

 It belongs exactly in one parent node.  Cannot have children (a property had property) Values can store actual content that is being stored in the JCR.                                                                     Cannot be ordered (Why does that matter.)                                                                                     No such things as null, if you set it to null, it will remove it from the node.

Normal data types

Property Type. STRING

Property Type. BINARY

Property Type. DATE

Property Type. LONG

Property Type. DOUBLE

Property Type. BOOLEAN

  • 3 Special data types

Property Type. NAME- Essentially a namespace aware string.

             Property Type. PATH- Used for listing routes in a workspace.

             Property Type. STRING- Reference another node if that node is a referenceable node.

Namespaces

Prefix delimited by a single colon.

Reserved namespaces: jcr, nt, mix, xml and “ “ are empty.

Tend to functions the same as Xml namespaces.

It is used for the same reasons as Xml namespaces.

PATHS

  • Any items can be formed in an absolute way.
  • ./ is the root node of a tree in a workplace.
  • Relative paths are supporting with mix syntax.
  • ../
  • ./

Apache jackrabbit Architecture:

The general architecture of Jackrabbit has been described in three Layers: A Content Application Layer, an API Layer, and a Content Repository Implementation Layer.

apache-jackrabbit-3

Content Applications

Content Applications interrelate through the JSR-170 API with the Content Repository Implementation. Several applications are available for JSR-170 repositories, some of them are very generic (like a WebDAV server) other applications can be very particular and make use of the content repository as a store for the information that is used by the claims. Java Applications use a JSR-170 content repository as a replacement for anything from property-files, XML-configuration, certain portions of relational database functionality to the straight file system and blob-management by using a content repository allows an application to deal with a hierarchical space repository services like versioning, query, transactions, or namespaces which make a content repository ultimate data store for many applications.

Content Repository API

The Content Repository API Layer is breaking into two major sections.

  • The Content Repository API defined by JSR-170.
  • Several features of a content repository that has been removed from the JSR-170 specification since they are challenging to implement on existing non-java-based content repositories and administration Repository tasks that have also been deliberately excluded from JSR-170

There are only very few (mostly administration) applications which use the non-JSR-170 APIs provided by Jackrabbit.

 The Architecture chart boxes do not represent package names or class names, but functionality is grouped symentically in

Content Repository Implementation

The Content Repository Implementation part of the architecture chart consider the building blocks of the Jackrabbit Content Repository Implementation.

The size of the blocks represents roughly the amount of code and therefore, the complexity of the individual functional block. Again the functional blocks do not precisely map to package or class names.

Mostly three scopes in a content repository: A repository scope, a workspace scope, and a session scope.

Every function that is operated against a repository can be attributed to at least one of these scopes, and some features can work on more than one range.

  • Repository
  • Node type
  • Version
  • Namespace Registry
  • Workspace
  • Query
  • Observation
  • State
  • Xml
  • Session
  • Path
  • Hierarchy Manager
  • QName

This is not a complete list but has some of the essential components of the content repository implementation.

Deployment Models

JSR-170 allows for numerous different deployment models, meaning that it is entirely up to the repository implementation to suggest specific models.

Jackrabbit is built to carry a variety of different deployment models; some of the probabilities are to deploy Jackrabbit will be outlined here.

See also following "HOWTO" documents for setting up and using the different deployment models:

Model 1: The (Web) Application Bundle

For many applications, applications that run in a closed context without interacting with other applications or data sources, it might be desirable to bundle a content repository with  Apache jackrabbit app itself.

Jackrabbit is built for the lightweight model and allows through the abstraction provided by JSR-170 to move at any point in time to a different deployment model in case this should be desirable for the context that the application runs in.

apache-jackrabbit4

Application1 and Application2 both contain their instances of the Content Repository distributed as the part of their .war file and since it has been loaded with web application's class loader which makes transparent to other applications in the system.

This deployment model also works for any stand-alone application and not for web applications.

Model-2: Shared J2EE Resources

Way to deploy a repository is to generating it visible as a resource to all the web applications that are running in a Servlet Container by scheduling the deposit like as Resource Adapter to Application Server.

Similar to the first deployment model, the deployment model does also not require a network layer, and therefore, It would be examined and running a private part of the same JVM(Java Virtual Machine).

The repository has started and stopped with the Application Server but it visible to all the applications to connect.

apache jackrabbit 5

Model-3: The Repository Server

In enterprise environments, deployment model of client/server is extensively used for relational databases. While with relational databases, this is only a deployment model that is supported by most RDBMS vendors for repositories, in particular for Jackrabbit, this is only on various options.

The client/server deployment model will undoubtedly be prevalent in environments that where it is desirable to physically separate the content repository (or data) layer from the application or presentation layer, so the content repository can be used from many different applications, physically and can be scaled individually.

apache jackrabbit 6

Jackrabbit Repository Installation & Configuration

The Apache Jackrabbit content repository is a complete implementation of the Content Repository for Java Technology API.

A content repository is an enjoin content store with serve for structured and unstructured content, full-text search, versioning, transactions, observation, and many more.

This document understands how to set up a Jackrabbit content repository in the Web-Application Bundle deployment model.

The instructions in this document produce to Tomcat versions 5.x and 6.x. It is easy to modify the guidelines for other container environments.

What you'll need to install the Apache Jackrabbit:

  • Ubuntu Server 16.04 LTS
  • Secure Shell (SSH) access to your server
  • Basic Linux command line knowledge.

You need to do the following steps:

apache jackrabbit 7

 If the web-application deployment is successful, pointing your browser to jackrabbit-webapp-<version> you can see the below page:

Use the URL given below:

 "http://localhost:8080/jackrabbit-webapp-patched-<version>/repository/default/“for accessing the content repository in our WebDAV client site.

The server asks for authentication the username, and password is organized as init parameters in the web.xml file for RepositoryStartup servlet.

Another specification you can set is the repository-path, the path where the jackrabbit repository will be installed. The web-app will use these parameters at categorization period.

Repository Configuration

Jackrabbit's main configuration lies in the repository.xml file. It carries a global setting such as Login and Access Management, Versioning, or Clustering. Then it defines how the real data for a particular workspace should be stored by selecting a PersistenceManager and what search/query execution to use by organizing a SearchIndex.

Node type Registry

Each Jackrabbit instance handles a NodeTypeRegistry which is programmed on start-up and generated with the set of built-in node types.

Node types are explained in /jackrabbit/repository/nodetypes/custom_nodetypes.xml by operating the "Compact Namespace and Node Type Definition" (CND) notation, and then register it is using the "JackrabbitNodeTypeManager."

The custom_nodetypes.xml file explains:

  • Which node types are being supported in the repository file.
  1. It is the definition of a supported node type.
  • This node is a type of it into <tomcat-install-dir>/shared/lib.node.
  • The definition of an item in a node type describes to its parent node.




How Jackrabbit works?

It is a general and straightforward operation that handles a large portion of the components in the Jackrabbit implementation. Please keep in mind that this implementation architecture is not mandatory by JCR, but it is designing from scratch based on JCR. 

apache jackrabbit 8

The used components and their respective functions in the order of their appearance in the use case of writing or modifying content in the content repository:

  • Transient Item State Manager - If ever content items are read by a session; they are stored in the Transient Item State Manager. When those items are modified, the modification is only visible to that same Session, in the so-called "transient" space.
  • Transactional Item State Manager- When the Application persists the modified items using the JCR Item.save() or Session .save() the transient Items are promoted into the Transactional ISM. The modifications are still only visible within the scope of this transaction, meaning that other sessions will not see the change until they are committed. The commit is implicit if the Content Repository is not running in an XA environment.
  • Shared Item State Manager- Once a transaction is committed; the Shared Item State Manager receives the changelog and publishes the changes to all the sessions logged into the same workspace.

This means that all the Item States that are cached and referenced by other courses are notified and possibly updated or invalidated. The Shared Item State Manager also triggers the observation and hands the change log over to the persistence manager that is configured for this workspace.

  • Persistence Manager-The Persistence Manager preserves all the Item States in the changelog passed by the Shared ISM. The persistence manager is a straightforward, fast and unique interface that is very low-level and does not need to understand the complexities of the repository operations but needs to be able to persist and retrieve a given item based on its item id.
  • Observation-When a transaction is commuted, the Shared Item State Manager provoke the Observation mechanism. This allows applications to subscribe to changes in the workspace asynchronously. Jackrabbit also non-standard offers an asynchronous observation.
  • Query Manager / Index -Through asynchronous observation event, the Query Manager, are instructed to index the new or modified items. A content repository index is much more complicated than a classical RDB index since it deals with content repository features like the item hierarchy, node type inheritance or full-text searches.

 Jackrabbit configuration

Apache Jackrabbit has two pieces of information to set up a runtime repository element

Repository home directory: 

The file system path of the directory containing the content repository accessed by the runtime instance of Jackrabbit. The list usually contains all the repository content, search indexes, internal configuration, and other persistence information managed within the content repository. A designated repository home directory is always needed even if some components choose not to use it. Jackrabbit will automatically fill the repository home directory with all the required files and subdirectories when the principal repository will first express. 

Repository configuration file:

The file system path of the repository configuration in the XML file. This file specifies the class name and properties of the various Jackrabbit used to manage and access the content repository. Jackrabbit parses the configuration files instantiates the specified components when the runtime content repository instance is created.

Repository configuration

The repository configuration file, manually called repository.xml, specifies a global option like security, versioning, and clustering setting. An error workspace configuration template is also included in the repository configuration file, the format of the XML configuration file is defined in the following document type definition file published by the Apache Jackrabbit projects.

The upper-level structure of the repository configuration file is shown below. The <!DOCTYPE> declaration is optional. But if you generate it, Jackrabbit 1.5 will use XML validation to make sure that the configuration file is completely formatted.


 
 
 
 
 
   
       
     
 
                                     

Starting with jackrabbit 1.5, Order of configuration element below <!Repository> is now fixed.

The repository configuration element is:

  • FileSystem: Repository store the virtual file system to store and manage things like registered namespaces and node types.
  • Security: Authentication and authorization configuration.
  • Workspaces: Configuration on where and how the workspaces will be managed.
  • Workspace: There is a workspace configuration template in default mode.
  • Versioning: Configure the large version store of the repository.
  • SearchIndex: Configuration of the search index that maintains the repository-wide /jcr: system content tree.
  • Cluster: Configuration of clustering.
  • DataStore: Configuration of data.

Bean configuration element

Most of the entries in the configuration files have based on the following generic JavaBean configuration pattern.




Configuration variables

  • Jackrabbit support configuration variables of the form ${name}. These variables can be used to avoid hard coding specific option in the configuration file. The following variables are require in all jackrabbit versions.
  • ${rep.home}: Repository home directory.
  • ${wsp.name}: Workspace name. Only available in workspace configuration.
  • ${wsp.home}: Workspace home directory. Only available in workspace configuration.

In  Jackrabbit 1.4 (it has been possible to use system properties or any application-specific settings as configuration variables).

Conclusion:

Apache Jackrabbit support for both structured and unstructured content, and its design in hierarchical manner.  Nowadays works on apache jackrabbit oak project. Jackrabbit is a complete & fully compliant implementation of the content repository API for Java technology, and therefore, its primary API is defined by JCR. The classes and interfaces within Jackrabbit are only needed when accessing functionality that is not specified in JCR.

 Developers use various tricks to assist with their work, such as IntelliJ Idea or eclipse. Jackrabbit is a better framework of Apache, which is mainly used for accessing content repository. Apache Jackrabbit is a featured content repository that implements the JCR API.