What is Apache jackrabbit?

Apache Jackrabbit is a platform of java open source content repository. A JCR (Java content repository) is a type of object database to customizing, storing, searching and retrieving hierarchical data. It was started on August 28, 2004, and developed by Apache Software Foundation. Jackrabbit is entirely written in Java & uses a cross-platform operating system. It is structured content, and it uses SQL queries. It provides full-text indexing ( Excel, Word, PDF), etc.

Currently, we are working in Apache Jackrabbit version 2.16.1, which has been released in Feb9, 2018. It implements JSR-170 & JSR-283.

A content-based repository is a hierarchical content store which assembles for structured and unstructured content, full-text search, versioning transactions observation, and more.

JSR-170 specifies a Level 1, a Level 2 and a set of advanced repository feature blocks. Jackrabbit is fully JSR-170 compliant and therefore supports Level 1, Level 2, and all the optional blocks.

Difference between JCR and Database

JCR (Java content repository)	Database
A JCR uses a database to store things.	A database stores things.
A JCR enforces an API to access its data and database.	A database does not implement an API to access its data.
A JCR is built on a set of standards.	A database can utilize some standard, but implementations vary and deviate.
A JCR stores content.	A database stores data.

Workspaces

There can be more than one repository in a JCR.
Each has one and only one root node.
Name of default workspace is "default."
Creating a new workspace is commonly administrative and requires shutdown and new configuration file.

Nodes

Nodes are the actual items we think of a tree.

Every workspace starts with a reset node.

Each node has 0 or more children.

Each may have 0 or more properties.

All nodes have exactly one parent.

All nodes have a name (through its bitrate.)

More node stuff

Each node exactly one primary type and a node have many mixings (behaviors).

Node types

Standard node types (nt: file, nt: hierarchy Node, etc) External Node Types (sling: folder, cq: page).

We can make our node type; there is an API for that.

Properties

It belongs exactly in one parent node. Cannot have children (a property had property) Values can store actual content that is being stored in the JCR. Cannot be ordered (Why does that matter.) No such things as null, if you set it to null, it will remove it from the node.

Normal data types

Property Type. STRING

Property Type. BINARY

Property Type. DATE

Property Type. LONG

Property Type. DOUBLE

Property Type. BOOLEAN

3 Special data types

Property Type. NAME- Essentially a namespace aware string.

Property Type. PATH- Used for listing routes in a workspace.

Property Type. STRING- Reference another node if that node is a referenceable node.

Namespaces

Prefix delimited by a single colon.

Reserved namespaces: jcr, nt, mix, xml and “ “ are empty.

Tend to functions the same as Xml namespaces.

It is used for the same reasons as Xml namespaces.

PATHS

Any items can be formed in an absolute way.
./ is the root node of a tree in a workplace.
Relative paths are supporting with mix syntax.
../
./

Apache jackrabbit Architecture:

The general architecture of Jackrabbit has been described in three Layers: A Content Application Layer, an API Layer, and a Content Repository Implementation Layer.

Content Applications

Content Applications interrelate through the JSR-170 API with the Content Repository Implementation. Several applications are available for JSR-170 repositories, some of them are very generic (like a WebDAV server) other applications can be very particular and make use of the content repository as a store for the information that is used by the claims. Java Applications use a JSR-170 content repository as a replacement for anything from property-files, XML-configuration, certain portions of relational database functionality to the straight file system and blob-management by using a content repository allows an application to deal with a hierarchical space repository services like versioning, query, transactions, or namespaces which make a content repository ultimate data store for many applications.

Content Repository API

The Content Repository API Layer is breaking into two major sections.

The Content Repository API defined by JSR-170.
Several features of a content repository that has been removed from the JSR-170 specification since they are challenging to implement on existing non-java-based content repositories and administration Repository tasks that have also been deliberately excluded from JSR-170

There are only very few (mostly administration) applications which use the non-JSR-170 APIs provided by Jackrabbit.

The Architecture chart boxes do not represent package names or class names, but functionality is grouped symentically in

Content Repository Implementation

The Content Repository Implementation part of the architecture chart consider the building blocks of the Jackrabbit Content Repository Implementation.

The size of the blocks represents roughly the amount of code and therefore, the complexity of the individual functional block. Again the functional blocks do not precisely map to package or class names.

Mostly three scopes in a content repository: A repository scope, a workspace scope, and a session scope.

Every function that is operated against a repository can be attributed to at least one of these scopes, and some features can work on more than one range.

Repository
Node type
Version
Namespace Registry
Workspace
Query
Observation
State
Xml
Session
Path
Hierarchy Manager
QName

This is not a complete list but has some of the essential components of the content repository implementation.

Deployment Models

JSR-170 allows for numerous different deployment models, meaning that it is entirely up to the repository implementation to suggest specific models.

Jackrabbit is built to carry a variety of different deployment models; some of the probabilities are to deploy Jackrabbit will be outlined here.

See also following "HOWTO" documents for setting up and using the different deployment models:

Model 1: The (Web) Application Bundle

For many applications, applications that run in a closed context without interacting with other applications or data sources, it might be desirable to bundle a content repository with Apache jackrabbit app itself.

Jackrabbit is built for the lightweight model and allows through the abstraction provided by JSR-170 to move at any point in time to a different deployment model in case this should be desirable for the context that the application runs in.

Application1 and Application2 both contain their instances of the Content Repository distributed as the part of their .war file and since it has been loaded with web application's class loader which makes transparent to other applications in the system.

This deployment model also works for any stand-alone application and not for web applications.

Model-2: Shared J2EE Resources

Way to deploy a repository is to generating it visible as a resource to all the web applications that are running in a Servlet Container by scheduling the deposit like as Resource Adapter to Application Server.

Similar to the first deployment model, the deployment model does also not require a network layer, and therefore, It would be examined and running a private part of the same JVM(Java Virtual Machine).

The repository has started and stopped with the Application Server but it visible to all the applications to connect.

Model-3: The Repository Server

In enterprise environments, deployment model of client/server is extensively used for relational databases. While with relational databases, this is only a deployment model that is supported by most RDBMS vendors for repositories, in particular for Jackrabbit, this is only on various options.

The client/server deployment model will undoubtedly be prevalent in environments that where it is desirable to physically separate the content repository (or data) layer from the application or presentation layer, so the content repository can be used from many different applications, physically and can be scaled individually.

Jackrabbit Repository Installation & Configuration

The Apache Jackrabbit content repository is a complete implementation of the Content Repository for Java Technology API.

A content repository is an enjoin content store with serve for structured and unstructured content, full-text search, versioning, transactions, observation, and many more.

This document understands how to set up a Jackrabbit content repository in the Web-Application Bundle deployment model.

The instructions in this document produce to Tomcat versions 5.x and 6.x. It is easy to modify the guidelines for other container environments.

What you'll need to install the Apache Jackrabbit:

Ubuntu Server 16.04 LTS
Secure Shell (SSH) access to your server
Basic Linux command line knowledge.

You need to do the following steps:

Download jcr-2.0.jar and maintain it into the URL"<tomcat-install-dir>/shared/lib."
Get the WAR distribution from Stable Build (jackrabbit-webapp-patched-<version>.tar.gz) and develop it into Tomcat.

If the web-application deployment is successful, pointing your browser to jackrabbit-webapp-<version> you can see the below page:

Use the URL given below:

"http://localhost:8080/jackrabbit-webapp-patched-<version>/repository/default/“for accessing the content repository in our WebDAV client site.

The server asks for authentication the username, and password is organized as init parameters in the web.xml file for RepositoryStartup servlet.

Another specification you can set is the repository-path, the path where the jackrabbit repository will be installed. The web-app will use these parameters at categorization period.

Repository Configuration

Jackrabbit's main configuration lies in the repository.xml file. It carries a global setting such as Login and Access Management, Versioning, or Clustering. Then it defines how the real data for a particular workspace should be stored by selecting a PersistenceManager and what search/query execution to use by organizing a SearchIndex.

Node type Registry

Each Jackrabbit instance handles a NodeTypeRegistry which is programmed on start-up and generated with the set of built-in node types.

Node types are explained in /jackrabbit/repository/nodetypes/custom_nodetypes.xml by operating the "Compact Namespace and Node Type Definition" (CND) notation, and then register it is using the "JackrabbitNodeTypeManager."

The custom_nodetypes.xml file explains:

Which node types are being supported in the repository file.

It is the definition of a supported node type.

This node is a type of it into <tomcat-install-dir>/shared/lib.node.

The definition of an item in a node type describes to its parent node.

How
Jackrabbit works?

It is a general and straightforward operation that
handles a large portion of the components in the Jackrabbit implementation.
Please keep in mind that this implementation architecture is not mandatory by
JCR, but it is designing from scratch based on JCR.

The used components and their respective functions
in the order of their appearance in the use case of writing or modifying
content in the content repository:

Transient
Item State Manager - If ever content items are read by a session; they are stored
in the Transient Item State Manager. When those items are modified, the
modification is only visible to that same Session, in the so-called
"transient" space.
Transactional
Item State Manager- When the Application persists the modified items
using the JCR Item.save() or Session .save() the transient Items are promoted
into the Transactional ISM. The modifications are still only visible within the
scope of this transaction, meaning that other sessions will not see the change
until they are committed. The commit is implicit if the Content Repository is
not running in an XA environment.
Shared
Item State Manager- Once a transaction is committed;
the Shared Item State Manager receives the changelog and publishes the changes
to all the sessions logged into the same workspace.

This means that all the Item States that are
cached and referenced by other courses are notified and possibly updated or
invalidated. The Shared Item State Manager also triggers the observation and
hands the change log over to the persistence manager that is configured for
this workspace.

Persistence Manager-The Persistence Manager preserves all the
Item States in the changelog passed by the Shared ISM. The persistence manager
is a straightforward, fast and unique interface that is very low-level and does
not need to understand the complexities of the repository operations but needs
to be able to persist and retrieve a given item based on its item id.
Observation-When a transaction is commuted,
the Shared Item State Manager provoke the Observation mechanism. This allows applications
to subscribe to changes in the workspace asynchronously. Jackrabbit also
non-standard offers an asynchronous observation.
Query Manager / Index -Through
asynchronous observation event, the Query Manager, are instructed to index the
new or modified items. A content repository index is much more complicated than
a classical RDB index since it deals with content repository features like the
item hierarchy, node type inheritance or full-text searches.

Jackrabbit
configuration

Apache
Jackrabbit has two pieces of information to set up a runtime repository element

Repository
home directory:

The file system path of the directory
containing the content repository accessed by the runtime instance of
Jackrabbit. The list usually contains all the repository content, search
indexes, internal configuration, and other persistence information managed
within the content repository. A designated repository home directory is always
needed even if some components choose not to use it. Jackrabbit will
automatically fill the repository home directory with all the required files
and subdirectories when the principal repository will first express.

Repository configuration file:

The file system path of the repository
configuration in the XML file. This file specifies the class name and
properties of the various Jackrabbit used to manage and access the content
repository. Jackrabbit parses the configuration files instantiates the
specified components when the runtime content repository instance is created.

Repository configuration

The repository configuration file, manually
called repository.xml,
specifies a
global option like security, versioning, and clustering setting. An error
workspace configuration template is also included in the repository
configuration file, the format of the XML configuration file is defined in the
following document type definition file published by the Apache Jackrabbit
projects.

-//The Apache Software Foundation//DTD Jackrabbit 1.5//EN
-//The Apache Software Foundation//DTD Jackrabbit 1.4//EN
-//The Apache Software Foundation//DTD Jackrabbit 1.2//EN
-//The Apache Software Foundation//DTD Jackrabbit 1.0//EN

The upper-level structure of the repository
configuration file is shown below. The <!DOCTYPE> declaration is optional. But if you generate
it, Jackrabbit 1.5 will use XML validation to make sure that the configuration
file is completely formatted.

Starting with jackrabbit 1.5, Order of
configuration element below <!Repository> is now fixed.

The repository configuration element is:

FileSystem: Repository store the virtual
file system to store and manage things like registered namespaces and node
types.
Security: Authentication and
authorization configuration.
Workspaces: Configuration on where and how
the workspaces will be managed.
Workspace: There is a workspace
configuration template in default mode.
Versioning: Configure the large version
store of the repository.
SearchIndex: Configuration of the search
index that maintains the repository-wide /jcr: system content tree.
Cluster: Configuration of clustering.
DataStore: Configuration of data.

Bean configuration element

Most of the entries in the configuration
files have based on the following generic JavaBean configuration pattern.

Configuration variables

Jackrabbit support configuration variables of the form ${name}. These variables
can be used to avoid hard coding specific option in the configuration file. The
following variables are require in all jackrabbit versions.
${rep.home}: Repository home directory.
${wsp.name}: Workspace name. Only available in workspace
configuration.
${wsp.home}: Workspace home directory. Only available in
workspace configuration.

In Jackrabbit 1.4 (it has been possible to use
system properties or any application-specific settings as configuration
variables).

Conclusion:

Apache Jackrabbit support for both structured
and unstructured content, and its design in hierarchical manner. Nowadays works on apache jackrabbit oak
project. Jackrabbit is a complete & fully compliant implementation of the
content repository API for Java technology, and therefore, its primary API is
defined by JCR. The classes and interfaces within Jackrabbit are only needed
when accessing functionality that is not specified in JCR.

Developers use various tricks to assist with
their work, such as IntelliJ Idea or eclipse. Jackrabbit is a better framework
of Apache, which is mainly used for accessing content repository. Apache
Jackrabbit is a featured content repository that implements the JCR API.

Apache Jackrabbit Tutorial for Beginners

What is Apache jackrabbit?

Difference between JCR and Database

Content Repository Implementation

Deployment Models

Jackrabbit Repository Installation & Configuration

How Jackrabbit works?