Chan Chen Coding...

Indexes in MongoDB

Refer to: http://www.mongodb.org/display/DOCS/Indexes

Basics

An index is a data structure that collects information about the values of the specified fields in the documents of a collection. This data structure is used by Mongo's query optimizer to quickly sort through and order the documents in a collection. Formally speaking, these indexes are implemented as "B-Tree" indexes.

In the shell, you can create an index by calling the ensureIndex() function, and providing a document that specifies one or more keys to index. Referring back to our examples database from Mongo Usage Basics, we can index on the 'j' field as follows:

db.things.ensureIndex({j:1}); 

The ensureIndex() function only creates the index if it does not exist.

Once a collection is indexed on a key, random access on query expressions which match the specified key are fast. Without the index, MongoDB has to go through each document checking the value of specified key in the query:

db.things.find({j:2});  // fast - uses index db.things.find({x:3});  // slow - has to check all because 'x' isn't indexed 

You can run

db.things.getIndexes() 

in the shell to see the existing indexes on the collection. Run

db.system.indexes.find() 

to see all indexes for the database.

ensureIndex creates the index if it does not exist. A standard index build will block all other database operations. If your collection is large, the build may take many minutes or hours to complete - if you must build an index on a live MongoDB instance, we suggest that you build it in the background using the background : true option. This will ensure that your database remains responsive even while the index is being built. Note, however, that background indexing may still affect performance, particularly if your collection is large.

If you use replication, background index builds will block operations on the secondaries. To build new indices on a live replica set, it is recommended you follow the steps described here.

In many cases, not having an index at all can impact performance almost as much as the index build itself. If this is the case, we recommend the application code check for the index at startup using the chosen mongodb driver's getIndex() function and terminate if the index cannot be found. A separate indexing script can then be explicitly invoked when safe to do so.

The _id Index

For all collections except capped collections, an index is automatically created for the _id field. This index is special and cannot be deleted. The _id index enforces uniqueness for its keys (except for some situations with sharding).

_id values are invariant.

Indexing on Embedded Fields("Dot Notation")

With MongoDB you can even index on a key inside of an embedded document. Reaching into sub-documents is referred to as Dot Notation. For example:

db.things.ensureIndex({"address.city": 1}) 

Compound Keys

In addition to single-key basic indexes, MongoDB also supports multi-key "compound" indexes. Just like basic indexes, you use theensureIndex() function in the shell to create the index, but instead of specifying only a single key, you can specify several :

db.things.ensureIndex({j:1, name:-1}); 

When creating an index, the number associated with a key specifies the direction of the index, so it should always be 1 (ascending) or -1 (descending). Direction doesn't matter for single key indexes or for random access retrieval but is important if you are doing sorts or range queries on compound indexes.

If you have a compound index on multiple fields, you can use it to query on the beginning subset of fields. So if you have an index on

a,b,c

you can use it query on

a
a,b
a,b,c
v1.6+
Now you can also use the compound index to service any combination of equality and range queries from the constitute fields. If the first key of the index is present in the query, that index may be selected by the query optimizer. If the first key is not present in the query, the index will only be used if hinted explicitly. While indexes can be used in many cases where an arbitrary subset of indexed fields are present in the query, as a general rule the optimal indexes for a given query are those in which queried fields precede any non queried fields.

Indexing Array Elements

When a document's stored value for a index key field is an array, MongoDB indexes each element of the array. See the Multikeys page for more information.

Creation Options

The second argument for ensureIndex is a document/object representing the options. These options are explained below.

optionvaluesdefault
backgroundtrue/falsefalse. see doc page for caveats
dropDupstrue/falsefalse
sparsetrue/falsefalse
uniquetrue/falsefalse
vindex version. 0 = pre-v2.0, 1 = smaller/faster (current)1 in v2.0. Default is used except in unusual situations.

name is also an option but need not be specified and will be deprecated in the future. The name of an index is generated by concatenating the names of the indexed fields and their direction (i.e., 1 or -1 for ascending or descending). Index names (including their namespace/database), are limited to 128 characters.

sparse:true

v1.8+

A sparse index can only have one field. SERVER-2193

A "sparse index" is an index that only includes documents with the indexed field.
Any document that is missing the sparsely indexed field will not be stored in the index; the index will therefor be sparse because of the missing documents when values are missing.

Sparse indexes, by definition, are not complete (for the collection) and behave differently than complete indexes. When using a "sparse index" for sorting (or in some cases just filtering) some documents in the collection may not be returned. This is because only documents in the index will be returned.

> db.people.ensureIndex({title : 1}, {sparse : true})
> db.people.save({name:"Jim"})
> db.people.save({name:"Sarah", title:"Princess"})
> db.people.find()
{ "_id" : ObjectId("4de6abd5da558a49fc5eef29"), "name" : "Jim" }
{ "_id" : ObjectId("4de6abdbda558a49fc5eef2a"), "name" : "Sarah", "title" : "Princess" }
> db.people.find().sort({title:1}) // only 1 doc returned because sparse
{ "_id" : ObjectId("4de6abdbda558a49fc5eef2a"), "name" : "Sarah", "title" : "Princess" }
> db.people.dropIndex({title : 1}) { "nIndexesWas" : 2, "ok" : 1 }
> db.people.find().sort({title:1}) // no more index, returns all documents
{ "_id" : ObjectId("4de6abd5da558a49fc5eef29"), "name" : "Jim" }
{ "_id" : ObjectId("4de6abdbda558a49fc5eef2a"), "name" : "Sarah", "title" : "Princess" }

You can combine sparse with unique to produce a unique constraint that ignores documents with missing fields.

Note that MongoDB's sparse indexes are not block-level indexes. MongoDB sparse indexes can be thought of as dense indexes with a specific filter.

unique:true

MongoDB indexes may optionally impose a unique key constraint, which guarantees that no documents are inserted whose values for the indexed keys match those of an existing document.

Note that if a document is missing the indexed field, the value is treated as null for that document. If the index is unique, two documents cannot both have the null value. If you want to allow missing fields or nulls for your unique indexed field, be sure to also use the sparse option.

Here are some examples using unique index constraints:

// everyone's username must be unique: 
db.things.ensureIndex({email:1},{unique:true});
// in this variation, it's ok to not have an email address,
// but if you have one, it must be unique:
db.things.ensureIndex({email:1},{unique:true,sparse:true});
// a compound index example. firstname+lastname combination must be unique:
db.things.ensureIndex({firstname: 1, lastname: 1}, {unique: true});

For non-sparse indexes (see the sparse option documentation), if a field is not present its value is treated as null for purposes of indexing. This is important to keep in mind when writing an application. For example:

db.customers.ensureIndex({firstname:1,lastname:1},{unique:true}); 
assert( db.customers.count() == 0 );
db.customers.insert({firstname:'jane',lastname:'doe'}); // ok
db.customers.insert({firstname:'jane',lastname:'doe'}); // dup key error
db.customers.insert({firstname:'jane',lastname:'smith'}); // ok
db.customers.insert({firstname:'jane'}); // ok, treated as {firstname:'jane',lastname:null} db.customers.insert({lastname:'smith'}); // ok
db.customers.insert({firstname:'john',lastname:'smith'}); // ok
db.customers.insert({firstname:'jane'}); // dup key error
db.customers.insert({email:'sally@abc.com',age:33}); // ok, treated as {firstname:null,lastname:null} db.customers.insert({email:'pete@abc.com',age:39}); // dup key error
dropDups:true

A unique index cannot be created on a key that has pre-existing duplicate values. If you would like to create the index anyway, keeping the first document the database indexes and deleting all subsequent documents that have duplicate values, add the dropDups option.

db.things.ensureIndex({firstname : 1}, {unique : true, dropDups : true}) 
dropDups deletes data. A "fat finger" with drop dups could delete almost all data from a collection. Backup before using. Note also that if the field is missing in multiple records, that evaluates to null, and those would then be consider duplicates – in that case using sparse, or not using dropDups, would be very important.
background:true

By default, building an index blocks other database operations. v1.4+ has a background index build option – however this option has significant limitations in a replicated cluster (see doc page).

Dropping Indexes

To delete all indexes on the specified collection:

db.collection.dropIndexes(); 

To delete a single index:

db.collection.dropIndex({x: 1, y: -1}) 

Running directly as a command without helper:

// note: command was "deleteIndexes", not "dropIndexes", before MongoDB v1.3.2 
// remove index with key pattern {y:1} from collection foo
db.runCommand({dropIndexes:'foo', index : {y:1}}) // remove all indexes:
db.runCommand({dropIndexes:'foo', index : '*'})

ReIndex

The reIndex command will rebuild all indexes for a collection.

db.myCollection.reIndex()

See here for more documentation: reIndex Command

Performance Notes

Updates

When you update an object, if the object fits in its previous allocation area, only those indexes whose keys have changed are updated. This improves performance. Note that if the object has grown and must move, all index keys must then update, which is slower.

How many indexes?

Indexes make retrieval by a key, including ordered sequential retrieval, very fast. Updates by key are faster too as MongoDB can find the document to update very quickly.

However, keep in mind that each index created adds a certain amount of overhead for inserts and deletes. In addition to writing data to the base collection, keys must then be added to the B-Tree indexes. Thus, indexes are best for collections where the number of reads is much greater than the number of writes. For collections which are write-intensive, indexes, in some cases, may be counterproductive. Most collections are read-intensive, so indexes are a good thing in most situations.

Using sort() without an Index

You may use sort() to return data in order without an index if the data set to be returned is small (less than four megabytes). For these cases it is best to use limit() and sort() together.

Additional Notes

Behaviors
  • MongoDB indexes (and string equality tests in general) are case sensitive.
  • Index information is kept in the system.indexes collection, run db.system.indexes.find() to see example data.
Using Documents as Keys

Indexed fields may be of any type, including (embedded) documents:

db.factories.insert( { name: "xyz", metro: { city: "New York", state: "NY" } } ); 
db.factories.ensureIndex( { metro : 1 } ); // this query can use the above index:
db.factories.find( { metro: { city: "New York", state: "NY" } } ); // this one too, as {city:"New York"} < {city:"New York",state:"NY"} db.factories.find( { metro: { $gte : { city: "New York" } } } ); // this query does not match the document because the order of fields is significant
db.factories.find( { metro: { state: "NY" , city: "New York" } } );

An alternative to documents as keys is to create a compound index:

db.factories.ensureIndex( { "metro.city" : 1, "metro.state" : 1 } ); // these queries can use the above index: 
db.factories.find( { "metro.city" : "New York", "metro.state" : "NY" } );
db.factories.find( { "metro.city" : "New York" } );
db.factories.find().sort( { "metro.city" : 1, "metro.state" : 1 } );
db.factories.find().sort( { "metro.city" : 1 } )

There are pros and cons to the two approaches. When using the entire (sub-)document as a key, compare order is predefined and is ascending key order in the order the keys occur in the BSON document.  With compound indexes reaching in, you can mix ascending and descending keys, and the query optimizer will then be able to use the index for queries on solely the first key(s) in the index too.

Keys Too Large To Index

Index entries have a limitation on their maximum size (the sum of the values), currently approximately 800 bytes. Documents which fields have values (key size in index terminology) greater than this size can not be indexed. You will see log messages similar to:

...Btree::insert: key too large to index, skipping...

Queries against this index will not return the unindexed documents. You can force a query to use another index, or really no index, using this special index hint:

db.myCollection.find({<key>: <value too large to index>}).hint({$natural: 1})

This will cause the document to be used for comparison of that field (or fields), rather than the index.

This limitation will eventually be removed (see SERVER-3372 ).



-----------------------------------------------------
Silence, the way to avoid many problems;
Smile, the way to solve many problems;

posted on 2012-02-27 14:34 Chan Chen 阅读(978) 评论(0)  编辑  收藏 所属分类: DB


只有注册用户登录后才能发表评论。


网站导航:
博客园   IT新闻   Chat2DB   C++博客   博问