The list of technology vendors offering big data solutions is seemingly infinite. Many of the big data solutions that are particularly popular right now fit into one of the following 15 categories:
1. The Hadoop Ecosystem
While Apache Hadoop may not be as dominant as it once was, it’s nearly impossible to talk about big data without mentioning this open source framework for distributed processing of large data sets. Last year, Forrester predicted, “100% of all large enterprises will adopt it (Hadoop and related technologies such as Spark) for big data analytics within the next two years.”
Over the years, Hadoop has grown to encompass an entire ecosystem of related software, and many commercial big data solutions are based on Hadoop. In fact, Zion Market Research forecasts that the market for Hadoop-based products and services will continue to grow at a 50 percent CAGR through 2022, when it will be worth $87.14 billion, up from $7.69 billion in 2016.
Key Hadoop vendors include Cloudera, Hortonworks and MapR, and the leading public clouds all offer services that support the technology.
Apache Spark is part of the Hadoop ecosystem, but its use has become so widespread that it deserves a category of its own. It is an engine for processing big data within Hadoop, and it’s up to one hundred times faster than the standard Hadoop engine, MapReduce.
In the AtScale 2016 Big Data Maturity Survey, 25 percent of respondents said that they had already deployed Spark in production, and 33 percent more had Spark projects in development. Clearly, interest in the technology is sizable and growing, and many vendors with Hadoop offerings also offer Spark-based products.
R, another open source project, is a programming language and software environment designed for working with statistics. The darling of data scientists, it is managed by the R Foundation and available under the GPL 2 license. Many popular integrated development environments (IDEs), including Eclipse and Visual Studio, support the language.
Several organizations that rank the popularity of various programming languages say that R has become one of the most popular languages in the world. For example, the IEEE says that R is the fifth most popular programming language, and both Tiobe and RedMonk rank it 14th. This is significant because the programming languages near the top of these charts are usually general-purpose languages that can be used for many different kinds of work. For a language that is used almost exclusively for big data projects to be so near the top demonstrates the significance of big data and the importance of this language in its field.
4. Data Lakes
To make it easier to access their vast stores of data, many enterprises are setting up data lakes. These are huge data repositories that collect data from many different sources and store it in its natural state. This is different than a data warehouse, which also collects data from disparate sources, but processes it and structures it for storage. In this case, the lake and warehouse metaphors are fairly accurate. If data is like water, a data lake is natural and unfiltered like a body of water, while a data warehouse is more like a collection of water bottles stored on shelves.
Data lakes are particularly attractive when enterprises want to store data but aren’t yet sure how they might use it. A lot of Internet of Things (IoT) data might fit into that category, and the IoT trend is playing into the growth of data lakes.
MarketsandMarkets predicts that data lake revenue will grow from $2.53 billion in 2016 to $8.81 billion by 2021.
5. NoSQL Databases
Traditional relational database management systems (RDBMSes) store information in structured, defined columns and rows. Developers and database administrators query, manipulate and manage the data in those RDBMSes using a special language known as SQL.
NoSQL databases specialize in storing unstructured data and providing fast performance, although they don’t provide the same level of consistency as RDBMSes. Popular NoSQL databases include MongoDB, Redis, Cassandra, Couchbase and many others; even the leading RDBMS vendors like Oracle and IBM now also offer NoSQL databases.
NoSQL databases have become increasingly popular as the big data trend has grown. According to Allied Market Research the NoSQL market could be worth $4.2 billion by 2020. However, the market for RDBMSes is still much, much larger than the market for NoSQL.
MonboDB is one of several well-known NoSQL databases.
6. Predictive Analytics
Predictive analytics is a sub-set of big data analytics that attempts to forecast future events or behavior based on historical data. It draws on data mining, modeling and machine learning techniques to predict what will happen next. It is often used for fraud detection, credit scoring, marketing, finance and business analysis purposes.
In recent years, advances in artificial intelligence have enabled vast improvements in the capabilities of predictive analytics solutions. As a result, enterprises have begun to invest more in big data solutions with predictive capabilities. Many vendors, including Microsoft, IBM, SAP, SAS, Statistica, RapidMiner, KNIME and others, offer predictive analytics solutions. Zion Market Research says the Predictive Analytics market generated $3.49 billion in revenue in 2016, a number that could reach $10.95 billion by 2022.
7. In-Memory Databases
In any computer system, the memory, also known as the RAM, is orders of magnitude faster than the long-term storage. If a big data analytics solution can process data that is stored in memory, rather than data stored on a hard drive, it can perform dramatically faster. And that’s exactly what in-memory database technology does.
Many of the leading enterprise software vendors, including SAP, Oracle, Microsoft and IBM, now offer in-memory database technology. In addition, several smaller companies like Teradata, Tableau, Volt DB and DataStax offer in-memory database solutions. Research from MarketsandMarkets estimates that total sales of in-memory technology were $2.72 billion in 2016 and may grow to $6.58 billion by 2021.
8. Big Data Security Solutions
Because big data repositories present an attractive target to hackers and advanced persistent threats, big data security is a large and growing concern for enterprises. In the AtScale survey, security was the second fastest-growing area of concern related to big data.
According to the IDG report, the most popular types of big data security solutions include identity and access controls (used by 59 percent of respondents), data encryption (52 percent) and data segregation (42 percent). Dozens of vendors offer big data security solutions, and Apache Ranger, an open source project from the Hadoop ecosystem, is also attracting growing attention.
9. Big Data Governance Solutions
Closely related to the idea of security is the concept of governance. Data governance is a broad topic that encompasses all the processes related to the availability, usability and integrity of data. It provides the basis for making sure that the data used for big data analytics is accurate and appropriate, as well as providing an audit trail so that business analysts or executives can see where data originated.
In the NewVantage Partners survey, 91.8 percent of the Fortune 1000 executives surveyed said that governance was either critically important (52.5 percent) or important (39.3 percent) to their big data initiatives. Vendors offering big data governance tools include Collibra, IBM, SAS, Informatica, Adaptive and SAP.
10. Self-Service Capabilities
With data scientists and other big data experts in short supply — and commanding large salaries — many organizations are looking for big data analytics tools that allow business users to self-service their own needs. In fact, a report from Research and Markets estimates that the self-service business intelligence market generated $3.61 billion in revenue in 2016 and could grow to $7.31 billion by 2021. And Gartner has noted, “The modern BI and analytics platform emerged in the last few years to meet new organizational requirements for accessibility, agility and deeper analytical insight, shifting the market from IT-led, system-of-record reporting to business-led, agile analytics including self-service.”
Hoping to take advantage of this trend, multiple business intelligence and big data analytics vendors, such as Tableau, Microsoft, IBM, SAP, Splunk, Syncsort, SAS, TIBCO, Oracle and other have added self-service capabilities to their solutions. Time will tell whether any or all of the products turn out to be truly usable by non-experts and whether they will provide the business value organizations are hoping to achieve with their big data initiatives.
11. Artificial Intelligence
While the concept of artificial intelligence (AI) has been around nearly as long as there have been computers, the technology has only become truly usable within the past couple of years. In many ways, the big data trend has driven advances in AI, particularly in two subsets of the discipline: machine learning and deep learning.
The standard definition of machine learning is that it is technology that gives “computers the ability to learn without being explicitly programmed.” In big data analytics, machine learning technology allows systems to look at historical data, recognize patterns, build models and predict future outcomes. It is also closely associated with predictive analytics.
Deep learning is a type of machine learning technology that relies on artificial neural networks and uses multiple layers of algorithms to analyze data. As a field, it holds a lot of promise for allowing analytics tools to recognize the content in images and videos and then process it accordingly.
Experts say this area of big data tools seems poised for a dramatic takeoff. IDC has predicted, “By 2018, 75 percent of enterprise and ISV development will include cognitive/AI or machine learning functionality in at least one application, including all business analytics tools.”
Leading AI vendors with tools related to big data include Google, IBM, Microsoft and Amazon Web Services, and dozens of small startups are developing AI technology (and getting acquired by the larger technology vendors).
12. Streaming analytics
As organizations have become more familiar with the capabilities of big data analytics solutions, they have begun demanding faster and faster access to insights. For these enterprises, streaming analytics with the ability to analyze data as it is being created, is something of a holy grail. They are looking for solutions that can accept input from multiple disparate sources, process it and return insights immediately — or as close to it as possible. This is particular desirable when it comes to new IoT deployments, which are helping to drive the interest in streaming big data analytics.
Several vendors offer products that promise streaming analytics capabilities. They include IBM, Software AG, SAP, TIBCO, Oracle, DataTorrent, SQLstream, Cisco, Informatica and others. MarketsandMarkets believes the streaming analytics solutions brought in $3.08 billion in revenue in 2016, which could increase to $13.70 billion by 2021.
13. Edge Computing
In addition to spurring interest in streaming analytics, the IoT trend is also generating interest in edge computing. In some ways, edge computing is the opposite of cloud computing. Instead of transmitting data to a centralized server for analysis, edge computing systems analyze data very close to where it was created — at the edge of the network.
The advantage of an edge computing system is that it reduces the amount of information that must be transmitted over the network, thus reducing network traffic and related costs. It also decreases demands on data centers or cloud computing facilities, freeing up capacity for other workloads and eliminating a potential single point of failure.
While the market for edge computing, and more specifically for edge computing analytics, is still developing, some analysts and venture capitalists have begun calling the technology the “next big thing.”
Also a favorite with forward-looking analysts and venture capitalists, blockchain is the distributed database technology that underlies Bitcoin digital currency. The unique feature of a blockchain database is that once data has been written, it cannot be deleted or changed after the fact. In addition, it is highly secure, which makes it an excellent choice for big data applications in sensitive industries like banking, insurance, health care, retail and others.
Blockchain technology is still in its infancy and use cases are still developing. However, several vendors, including IBM, AWS, Microsoft and multiple startups, have rolled out experimental or introductory solutions built on blockchain technology.
Blockchain is distributed ledger technology that offers great potential for data analytics.
15. Prescriptive Analytics
Many analysts divide big data analytics tools into four big categories. The first, descriptive analytics, simply tells what happened. The next type, diagnostic analytics, goes a step further and provides a reason for why events occurred. The third type, predictive analytics, discussed in depth above, attempts to determine what will happen next. This is as sophisticated as most analytics tools currently on the market can get.
However, there is a fourth type of analytics that is even more sophisticated, although very few products with these capabilities are available at this time. Prescriptive analytics offers advice to companies about what they should do in order to make a desired result happen. For example, while predictive analytics might give a company a warning that the market for a particular product line is about to decrease, prescriptive analytics will analyze various courses of action in response to those market changes and forecast the most likely results.
Currently, very few enterprises have invested in prescriptive analytics, but many analysts believe this will be the next big area of investment after organizations begin experiencing the benefits of predictive analytics.
The market for big data technologies is diverse and constantly changing. But perhaps one day soon predictive and prescriptive analytics tools will offer advice about what is coming next for big data — and what enterprises should do about it.