NoSQL数据库面临的挑战
NoSQL vs RDBMS: Why and why not to use NoSQL over RDBMS?
Naresh Kumar
3 January 2014
source: http://theprofessionalspoint.blogspot.in/2014/01/nosql-vs-rdbms-why-and-why-not-to-use.html
NoSQL (not only SQL) is not a relational database management system (RDBMS). We will discuss what is the difference between NoSQL databases and Relational Databases and then why and why not to use NoSQL database model over traditional and relational database model (RDBMS) in detail. As NoSQL is the new technology, it is also facing many challenges, so will also have a look upon them.
Today, the internet world has billions of users. Big Data, Big Users, and Cloud Computing are the big technologies which every major internet application is using or preparing to use because internet application users are growing day by day and data is becoming more and more complex and unstructured which is very hard to manage using traditional relational database management system (RDBMS). NoSQL technology has the answer to all these problems. NoSQL is meant for Unstructured Big Data and Cloud Computing. A NoSQL database is exactly the type of database that can handle the all sort of unstructured, messy and unpredictable data that our system of engagement requires. NoSQL is a whole new way of thinking about a database.
Difference between NoSQL and Relational Data Models (RDBMS)
Relational and NoSQL data models are very different.
The relational model takes data and separates it into many interrelated tables that contain rows and columns. Tables reference each other through foreign keys that are stored in columns as well. When looking up data, the desired information needs to be collected from many tables (often hundreds in today’s enterprise applications) and combined before it can be provided to the application. Similarly, when writing data, the write needs to be coordinated and performed on many tables.
NoSQL databases have a very different model. For example, a document-oriented NoSQL database takes the data you want to store and aggregates it into documents using the JSON format. Each JSON document can be thought of as an object to be used by your application. A JSON document might, for example, take all the data stored in a row that spans 20 tables of a relational database and aggregate it into a single document/object. Aggregating this information may lead to duplication of information, but since storage is no longer cost prohibitive, the resulting data model flexibility, ease of efficiently distributing the resulting documents and read and write performance improvements make it an easy trade-off for web-based applications.
Another major difference is that relational technologies have rigid schemas while NoSQL models are schemaless. Relational technology requires strict definition of a schema prior to storing any data into a database. Changing the schema once data is inserted is a big deal, extremely disruptive and frequently avoided – the exact opposite of the behavior desired in the Big Data era, where application developers need to constantly – and rapidly – incorporate new types of data to enrich their apps.It also may not provide full ACID (atomicity, consistency, isolation, durability) guarantees, but still has a distributed and fault tolerant architecture.
The NoSQL taxonomy supports key-value stores, document store, BigTable, and graph databases.
In comparison, document databases are schemaless, allowing you to freely add fields to JSON documents without having to first define changes. The format of the data being inserted can be changed at any time, without application disruption.
Examples: MongoDB, Cassandra, CouchDB, HBase are the examples of NoSQL.
NoSQL Database Types
Document databases pair each key with a complex data structure known as a document. Documents can contain many different key-value pairs, or key-array pairs, or even nested documents.
Graph stores are used to store information about networks, such as social connections. Graph stores include Neo4J and HyperGraphDB.
Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or "key"), together with its value. Examples of key-value stores are Riak and Voldemort. Some key-value stores, such as Redis, allow each value to have a type, such as "integer", which adds functionality.
Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows.
Why to use NoSQL Databases?
1. NoSQL has Flexible Data Model to Capture Unstructured / Semi-structured Big Data
Data is becoming easier to capture and access through third parties such as Facebook, D&B, and others. Personal user information, geo location data, social graphs, user-generated content, machine logging data, and sensor-generated data are just a few examples of the ever-expanding array of data being captured. It’s not surprising that developers want to enrich existing applications and create new ones made possible by it. And the use of the data is rapidly changing the nature of communication, shopping, advertising, entertainment, and relationship management. Apps that don’t leverage it quickly will quickly fall behind.
Developers want a very flexible database that easily accommodates new data types and isn’t disrupted by content structure changes from third-party data providers. Much of the new data is unstructured and semi-structured, so developers also need a database that is capable of efficiently storing it. Unfortunately, the rigidly defined, schema-based approach used by relational databases makes it impossible to quickly incorporate new types of data, and is a poor fit for unstructured and semi-structured data. NoSQL provides a data model that maps better to these needs.
A lot of applications might gain from this unstructured data model: tools like CRM, ERP, BPM, etc, could use this flexibility to store their data without performing changes on tables or creating generic columns in a database. These databases are also good to create prototypes or fast applications, because this flexibility provides a tool to develop new features very easily.
2. NoSQL is highly and easily scalable (Scale up vs Scale out)
If millions of users are using your app frequently and concurrently, you need to think about the scalable database technology instead of traditional RDBMS. With relational technologies, many application developers find it difficult, or even impossible, to get the dynamic scalability and level of scale they need while also maintaining the performance users demand. You need to switch to NoSQL databases.
For the cloud applications, relational databases were originally the popular choice. Their use was increasingly problematic however, because they are a centralized, share-everything technology that scales up rather than out. This made them a poor fit for applications that require easy and dynamic scalability. NoSQL databases have been built from the ground up to be distributed, scale-out technologies and therefore fit better with the highly distributed nature of the three-tier Internet architecture.
Scale up vs Scale out
To deal with the increase in concurrent users (Big Users) and the amount of data (Big Data), applications and their underlying databases need to scale using one of two choices: scale up or scale out. Scaling up implies a centralized approach that relies on bigger and bigger servers. Scaling out implies a distributed approach that leverages many standard, commodity physical or virtual servers.
Scale up with relational technology: limitations at the database tier
At the web/application tier of the three-tier Internet architecture, a scale out approach has been the default for many years and worked extremely well. As more people use an application, more commodity servers are added to the web/application tier, performance is maintained by distributing load across an increased number of servers, and the cost scales linearly with the number of users.
Prior to NoSQL databases, the default scaling approach at the database tier was to scale up. This was dictated by the fundamentally centralized, shared-everything architecture of relational database technology. To support more concurrent users and/or store more data, you need a bigger and bigger server with more CPUs, more memory, and more disk storage to keep all the tables. Big servers tend to be highly complex, proprietary, and disproportionately expensive, unlike the low-cost, commodity hardware typically used so effectively at the web/application server tier.
Scale out with NoSQL technology at the database tier
NoSQL databases were developed from the ground up to be distributed, scale out databases. They use a cluster of standard, physical or virtual servers to store data and support database operations. To scale, additional servers are joined to the cluster and the data and database operations are spread across the larger cluster. Since commodity servers are expected to fail from time-to-time, NoSQL databases are built to tolerate and recover from such failure making them highly resilient.
NoSQL databases provide a much easier, linear approach to database scaling. If 10,000 new users start using your application, simply add another database server to your cluster. Add ten thousand more users and add another server. There’s no need to modify the application as you scale since the application always sees a single (distributed) database.
At scale, a distributed scale out approach also usually ends up being cheaper than the scale up alternative. This is a consequence of large, complex, fault tolerant servers being expensive to design, build and support. Licensing costs of commercial relational databases can also be prohibitive because they are priced with a single server in mind. NoSQL databases on the other hand are generally open source, priced to operate on a cluster of servers, and relatively inexpensive.
While implementations differ, NoSQL databases share some characteristics with respect to scaling and performance:
DYNAMIC SCHEMAS
Relational databases require that schemas be defined before you can add data. For example, you might want to store data about your customers such as phone numbers, first and last name, address, city and state – a SQL database needs to know what you are storing in advance.
This fits poorly with agile development approaches, because each time you complete new features, the schema of your database often needs to change. So if you decide, a few iterations into development, that you'd like to store customers' favorite items in addition to their addresses and phone numbers, you'll need to add that column to the database, and then migrate the entire database to the new schema.
If the database is large, this is a very slow process that involves significant downtime. If you are frequently changing the data your application stores – because you are iterating rapidly – this downtime may also be frequent. There's also no way, using a relational database, to effectively address data that's completely unstructured or unknown in advance.
NoSQL databases are built to allow the insertion of data without a predefined schema. That makes it easy to make significant application changes in real-time, without worrying about service interruptions – which means development is faster, code integration is more reliable, and less database administrator time is needed.
AUTO-SHARDING
Because of the way they are structured, relational databases usually scale vertically – a single server has to host the entire database to ensure reliability and continuous availability of data. This gets expensive quickly, places limits on scale, and creates a relatively small number of failure points for database infrastructure. The solution is to scale horizontally, by adding servers instead of concentrating more capacity in a single server.
"Sharding" a database across many server instances can be achieved with SQL databases, but usually is accomplished through SANs and other complex arrangements for making hardware act as a single server. Because the database does not provide this ability natively, development teams take on the work of deploying multiple relational databases across a number of machines. Data is stored in each database instance autonomously. Application code is developed to distribute the data, distribute queries, and aggregate the results of data across all of the database instances. Additional code must be developed to handle resource failures, to perform joins across the different databases, for data rebalancing, replication, and other requirements. Furthermore, many benefits of the relational database, such as transactional integrity, are compromised or eliminated when employing manual sharding.
NoSQL databases, on the other hand, usually support auto-sharding, meaning that they natively and automatically spread data across an arbitrary number of servers, without requiring the application to even be aware of the composition of the server pool. Data and query load are automatically balanced across servers, and when a server goes down, it can be quickly and transparently replaced with no application disruption.
Cloud computing makes this significantly easier, with providers such as Amazon Web Services providing virtually unlimited capacity on demand, and taking care of all the necessary database administration tasks. Developers no longer need to construct complex, expensive platforms to support their applications, and can concentrate on writing application code. Commodity servers can provide the same processing and storage capabilities as a single high-end server for a fraction of the price.
“Sharding” a relational database can reduce, or eliminate in certain cases, the ability to perform complex data queries. NoSQL database systems retain their full query expressive power even when distributed across hundreds of servers.
INTEGRATED CACHING
A number of products provide a caching tier for SQL database systems. These systems can improve read performance substantially, but they do not improve write performance, and they add complexity to system deployments. If your application is dominated by reads then a distributed cache should probably be considered, but if your application is dominated by writes or if you have a relatively even mix of reads and writes, then a distributed cache may not improve the overall experience of your end users.
Many NoSQL database technologies have excellent integrated caching capabilities, keeping frequently-used data in system memory as much as possible and removing the need for a separate caching layer that must be maintained.
REPLICATION
Most NoSQL databases also support automatic replication, meaning that you get high availability and disaster recovery without involving separate applications to manage these tasks. The storage environment is essentially virtualized from the developer's perspective.
Challenges of NoSQL
The promise of the NoSQL database has generated a lot of enthusiasm, but there are many obstacles to overcome before they can appeal to mainstream enterprises. Here are a few of the top challenges.
1. Maturity
RDBMS systems have been around for a long time. NoSQL advocates will argue that their advancing age is a sign of their obsolescence, but for most CIOs, the maturity of the RDBMS is reassuring. For the most part, RDBMS systems are stable and richly functional. In comparison, most NoSQL alternatives are in pre-production versions with many key features yet to be implemented.
Living on the technological leading edge is an exciting prospect for many developers, but enterprises should approach it with extreme caution.
2. Support
Enterprises want the reassurance that if a key system fails, they will be able to get timely and competent support. All RDBMS vendors go to great lengths to provide a high level of enterprise support.
In contrast, most NoSQL systems are open source projects, and although there are usually one or more firms offering support for each NoSQL database, these companies often are small start-ups without the global reach, support resources, or credibility of an Oracle, Microsoft, or IBM.
3. Analytics and business intelligence
NoSQL databases have evolved to meet the scaling demands of modern Web 2.0 applications. Consequently, most of their feature set is oriented toward the demands of these applications. However, data in an application has value to the business that goes beyond the insert-read-update-delete cycle of a typical Web application. Businesses mine information in corporate databases to improve their efficiency and competitiveness, and business intelligence (BI) is a key IT issue for all medium to large companies.
NoSQL databases offer few facilities for ad-hoc query and analysis. Even a simple query requires significant programming expertise, and commonly used BI tools do not provide connectivity to NoSQL.
Some relief is provided by the emergence of solutions such as HIVE or PIG, which can provide easier access to data held in Hadoop clusters and perhaps eventually, other NoSQL databases. Quest Software has developed a product -- Toad for Cloud Databases -- that can provide ad-hoc query capabilities to a variety of NoSQL databases.
4. Administration
The design goals for NoSQL may be to provide a zero-admin solution, but the current reality falls well short of that goal. NoSQL today requires a lot of skill to install and a lot of effort to maintain.
5. Expertise
There are literally millions of developers throughout the world, and in every business segment, who are familiar with RDBMS concepts and programming. In contrast, almost every NoSQL developer is in a learning mode. This situation will address naturally over time, but for now, it's far easier to find experienced RDBMS programmers or administrators than a NoSQL expert.
Conclusion
NoSQL databases are becoming an increasingly important part of the database landscape, and when used appropriately, can offer real benefits. However, enterprises should proceed with caution with full awareness of the legitimate limitations and issues that are associated with these databases.
您目前处于: InfoQ首页 新闻 NoSQL与RDBMS:何时使用,何时不使用
NoSQL与RDBMS:何时使用,何时不使用
作者 张龙 发布于 1月 08, 2014
Naresh Kumar是位软件工程师与热情的博主,对于编程与新事物拥有极大的兴趣,非常乐于与其他开发者和程序员分享技术上的研究成果。近日,Naresh撰文比较了NoSQL与RDBMS,并详细介绍了他们各自的特点与适用的场景。
NoSQL并不是关系型数据库管理系统,本文将会介绍NoSQL数据库与关系型数据库之间的差别,同时还会讨论在何种场景下应该使用NoSQL,何种场景下不应该使用。由于NoSQL还是个相对较新的技术,因此它还面临着很多挑战。
时至今日,互联网上有数以亿计的用户。大数据与云计算已经成为很多主要的互联网应用都在使用或是准备使用的技术,这是因为互联网用户每天都在不断增长,数据也变得越来越复杂,而且有很多非结构化的数据存在,这是很难通过传统的关系型数据库管理系统来处理的。NoSQL技术则能比较好地解决这个问题,它主要用于非结构化的大数据与云计算上。从这个角度来看,NoSQL是一种全新的数据库思维方式。
为何要使用NoSQL数据库?
1.NoSQL具有灵活的数据模型,可以处理非结构化/半结构化的大数据
现在,我们可以通过Facebook、D&B等第三方轻松获得与访问数据,如个人用户信息、地理位置数据、社交图谱、用户产生的内容、机器日志数据以及传感器生成的数据等。对这些数据的使用正在快速改变着通信、购物、广告、娱乐以及关系管理的特质。没有使用这些数据的应用很快就会被用户所遗忘。开发者希望使用非常灵活的数据库,能够轻松容纳新的数据类型,并且不会被第三方数据提供商内容结构的变化所累。很多新数据都是非结构化或是半结构化的,因此开发者还需要能够高效存储这种数据的数据库。但遗憾的是,关系型数据库所使用的定义严格、基于模式的方式是无法快速容纳新的数据类型的,对于非结构化或是半结构化的数据更是无能为力。NoSQL提供的数据模型则能很好地满足这种需求。很多应用都会从这种非结构化数据模型中获益,比如说CRM、ERP、BPM等等,他们可以通过这种灵活性存储数据而无需修改表或是创建更多的列。这些数据库也非常适合于创建原型或是快速应用,因为这种灵活性使得新特性的开发变得非常容易。
2.NoSQL很容易实现可伸缩性(向上扩展与水平扩展)
如果有很多用户在频繁且并发地使用你的应用,那么你就需要考虑可伸缩的数据库技术而非传统的RDBMS了。对于关系型技术来说,很多应用开发者会发现动态的可伸缩性是难以实现的,这时就应该考虑切换到NoSQL数据库上。对于云应用来说,关系型数据库一开始是普遍的选择。然而,在使用过程中却遇到了越来越多的问题,原因就在于他们是中心化的,向上扩展而非水平扩展的。这使得他们不适合于那些需要简单且动态可伸缩性的应用。NoSQL数据库从一开始就是分布式、水平扩展的,因此非常适合于互联网应用分布式的特性。
在三层互联网架构的Web/应用层上,多年来向上扩展已经成为默认的扩展方式了。随着应用使用人数的激增,我们需要添加更多的服务器,性能则是通过负载均衡来实现的,这时的代价与用户数量成线性比例关系。在NoSQL数据库之前,数据库层的默认扩展方式就是向上扩展。为了支持更多的并发用户以及存储更多的数据,你需要越来越好的服务器,更好的CPU、更多的内存、更大的磁盘来维护所有表。然而,好的服务器意味着更加复杂、私有、并且也更加昂贵。这与Web/应用层所使用的便宜的硬件形成了鲜明的对比。
3.动态模式
关系型数据库需要在添加数据前先定义好模式。比如说,你需要存储客户的电话号码、姓名、地址、城市与州等信息,SQL数据库需要提前知晓你要存的是什么。这对于敏捷开发模式来说是场灾难,因为每次完成新特性时,数据库的模式通常都需要改变。因此,如果在开发过程中想将客户喜欢的条目加到数据库中,那就得向表中添加这一列才行,然后要做的就是将整个数据库迁移到新的模式上。
4.自动分片
由于是结构化的,关系型数据库通常会垂直扩展,单台服务器要持有整个数据库来确保可靠性与数据的持续可用性。这样做的代价就是非常昂贵、扩展受到限制,并且数据库基础设施会成为失败点。这个问题的解决方案就是水平扩展,添加服务器而不是为单台服务器增加更多的能力。NoSQL数据库通常都支持自动分片,这意味着他们本质上就会自动在多台服务器上分发数据,应用甚至都不知道这些事情。数据与查询负载会自动在多台服务器上做到平衡,当某台服务器当机时,它能快速且透明地被替换掉。
5.复制
大多数NoSQL数据库也支持自动复制,这意味着你可以获得高可用性与灾备恢复功能。从开发者的角度来看,存储环境本质上是虚拟化的。
NoSQL数据库面临的挑战
1.成熟度
RDBMS系统由来已久。NoSQL拥护者们会说RDBMS的高龄是其衰退的标志,不过对于大多数CIO来说,RDBMS的成熟让人放心。对于大多数情况来说,RDBMS系统是稳定且功能丰富的。相比较而言,大多数NoSQL数据库则还有很多特性有待实现。
2.支持
企业需要的是安心,如果关键系统出现了故障,他们可以获得即时的支持。所有RDBMS厂商都在不遗余力地提供良好的企业支持。与之相反,大多数NoSQL系统都是开源项目,虽然每种数据库都有那么几家公司提供支持,不过这些公司大多都是小的初创公司,没有全球支持资源,也没有Oracle、微软或是IBM那种令人放心的公信力。
3.分析与商业智能
NoSQL数据库在Web 2.0应用时代开始出现。因此,大多数特性都是面向这些应用的需要的。然而,应用中的数据对于业务来说是有价值的,这种价值远远超出了Web应用那种CRUD。企业数据库中的业务信息可以帮助改进效率并提升竞争力,商业智能对于大中型企业来说是个非常关键的IT问题。
4.管理
NoSQL的设计目标是提供零管理的解决方案,不过当今的现实却离这个目标还相去甚远。现在的NoSQL需要很多技巧才能用好,并且需要不少人力、物力来维护。
5.专业
全球有很多开发者,每个业务部门都会有熟悉RDBMS概念与编程的人。相反,几乎每个NoSQL开发者都处于学习模式。这种状况会随着时间的流逝而发生改观。但现在,找到一个有经验的RDBMS程序员或是管理员要比NoSQL专家容易多了。
结论
NoSQL数据库正在成为数据库领域的重要力量。如果使用恰当,那么它会带来很多好处。然而,企业应该非常小心并注意到这些数据库的限制与问题。