Modern software systems rarely operate in a completely isolated environment.
Large businesses often require different systems to communicate with each other
because information is shared between different departments for day to day operations.
For example, employees and managers may use
one system for filling in an approving time sheets,
while the payroll and accounting department may use
a different system to process the time sheets and pay employees.
As a software developer,
you need to be able to create a software architecture that is capable of
sharing information between separate components. So, how would you do it?
Let's take a look at the different issues that need to be addressed.
The first issue is that the state of a component
is only temporary while the component is running.
This means that any objects created and any amount of
data processing will be lost the moment you stop running the component.
So, how do we ensure our data is not lost?
One way is by saving the information to keep in a file.
That addresses is an issue,
but introduces another one.
Files are not the best way to transfer data between components.
We would need to worry about the file format
and distributing the file to all the components.
Furthermore, components may not need all the data that is stored in the files.
Thus, the second issue that we need to address is the need to
communicate the state of our data between multiple components.
Is there a solution that will help us solve both of these issues?
Luckily, there is a solution.
A data centric software architecture will not only
allow you to store and share data between multiple components,
but it helps increase the maintainability,
re-usability and scalability of the system.
This can be achieved by integrating a method of shared data storage,
such as a database into our overall system design.
At the core of a data centric architecture are two types of components.
Central data is the component used to store
and serve data across all components that connect to it,
and data accessors are the components that connect to the central data component.
The data accessors make queries and
transactions against the information stored in the database.
A high level view of this architectural design is fairly simple and easy to understand.
The data accessors are separated from one
another and communicate only to the central data component.
The central data facilitates data sharing by saving
the desired information from the current state of
the system and serving data as requested.
But if all the data is centralized in one location,
how does a data access or retrieve the correct information?
To understand how this works,
we will need to take a look at each component.
Often, the central data is stored in a database.
Databases are a complex topic,
which would take an entire course to discuss in depth.
So, we will look at a brief introduction to give you
a sense of how they fit into this architectural style.
A database is commonly used to store data because it ensures several data qualities.
The two key ones for a data centric architecture are data integrity,
a database will ensure the data is accurate and consistent over its lifespan.
This is important if you want to have reliable data, and data persistence.
A database will make sure that data will continue to
live on after a process has been terminated.
This means you can use a database to save all of
your data from any number of components).
The way the data is stored, presented,
and read in a database is different than the way it's done in Java.
We've discussed how abstract data types represented with
classes are used in object-oriented programming to represent,
modify, and save state.
One way a database can store information is by using tables.
Relational databases are a type of database that uses tables.
Each table represents an abstraction.
An employee table would represent all employees, for example,
with each column representing the employee attributes
and each row being a unique individual employee.
Relational databases use SQL or Structured Query Language to let you query or
ask the database for information and lets you perform
transactions or tell a database to do something.
The ability to query and perform transaction allows
a database to share information between data accessors.
Management and optimization of queries and transactions can be
automated by a database management system or DBMS,
so that integrating a database into your system is simplified.
In a data centric architectural design,
the central data is passive.
The database is generally not involved in
heavy data processing or large amounts of business logic.
The central data is primarily concerned with storing and serving the information.
A data accessor is essentially any component that connects to the database,
which is characterized by its abilities to:
share a set of data while being able to operate independently.
Communicate with the database through database queries and
transactions that accessors do not
need to interact directly with each other and therefore,
do not need to know about each other.
Query the database to obtain shared system information.
This is used to get data in order to perform computations.
And save the new state of the system back into the database using transactions.
Data is stored back into the database once the data accessor has finished its processing.
A data accessor contains all the business rules required to perform its functions.
This means that this software architecture enables you to
separate concerns into different specialized data accessors.
Also, use of the data accessors can be controlled.
So, an end-user only has permission for the ones they need on a day to day basis.
The data centric architecture has several advantages over a basic object-oriented system.
This is due to the integration of a centralized database,
which helps to facilitate data storage and sharing between data accessors.
This architectural design is capable of supporting data integrity,
data backup, and data restoration through a database.
These features can help with issues like massive data loss,
data corruption, and data migration.
A centralized database also reduces
the overhead for data transfer between your data accessors.
Since, this architecture uses a database as a means of data sharing,
each data accessor does not need to be concerned with talking to another.
They all use the database as a way of indirectly communicating with each other.
Having functionally independent data accessors
also means that your system can easily be scaled up.
You can easily build and integrate
additional functions without having to worry about affecting the others,
because the data accessors don't communicate directly with one another.
The central data component tends to live on
a separate server machine with sufficient disk storage dedicated to the database.
This enables you to have a centralized data repository,
which makes it easier to manage all of your systems information.
Of course, nothing is perfect and the data centric architecture is no exception.
There are disadvantages that are introduced by integrating a database.
Since, you will be using a centralized database,
the system becomes heavily reliant on the central data component.
If the data server goes off line,
becomes unusable or contains corrupted data,
your entire system will be affected.
There are safeguards put in place,
such as data redundancies to replicate your data onto separate hard disks.
However, the physical infrastructure can be expensive,
and it can be difficult and costly to get your system back up and running again.
Having a central database also means that
all your data accessors are dependent on what gets stored in the database.
If you're adding a data accessor to use a preexisting database,
then you will need to build around the existing data schema.
Since, you can only retrieve and store data defined in the schema,
anything that isn't stored must be derived.
In addition, if the database does not have
a matching table or column for a particular set of data,
you cannot use the database to save that specific data set.
Another disadvantage is that it is difficult to change the existing data schema.
Database changes can be hard to implement,
especially if there's a massive amount of data stored.
Data schema changes will also affect your data accessors.
So, you must be mindful of which data accessors need to be
changed in order for it to comply to the changes in the database.
Despite having these disadvantages,
many companies use a data centric design because it allows them to
share vast amounts of data between various departments and offices.
The data centric software architecture allows you to
store and manage large amounts of data into a central data component.
This increases your system stability,
performance, re-usability, and maintainability.
Separate the functionality of your data accessors,
which makes it easier for you to maintain and scale your entire system,
and facilitate data sharing between
data accessors through database queries and transactions.
A data centric architectural design is very popular,
because of how easy it is to implement and maintain,
and due to its highly scalable nature.
There is also a multitude of database products on the market,
which means you can pick the one that fits your needs.
However, it is still important to evaluate your system to
determine if this architectural design is the correct one to use.
There are always trade-offs when making these design decisions.
Do your end-users need to share a set of data?
How large is your user base?
These types of questions need to be asked in
order to select the appropriate software architecture.