i

Hadoop Tutorial

Mongodb

MongoDB is another kind of NoSQL database that boasts high performance, easy scalability, and high availability. It is based on the collections and documents theory. A MongoDB database is a physical and organized assembly of collections. A collection is a set of documents or a table, and a document is a group of key-value pairs.

Documents have a dynamic schema that means that there is no need for matching structures for each document in the same collection. Also, matching fields may contain different data types. That's why MongoDB very flexible. Sometimes this dynamic schema is known as a schema-less meaning that it can be almost anything.

MongoDB Document Structure:

MongoDB comes with a specific type of query language that is document-based and makes querying of the documents very easy. We may get some idea from a sample document that could be in MongoDB.

{

 _id: ObjectId(23f2918g201),

 postTitle: "Dog Videos",

 facebookUser: 1037882,

 facebookURL: "facebook.com/dogvideos",

 likes: 4000000,

 comments: 9073,

 shares: 100000050,

 postMetadata: [

 {

       timePosted: 10929682,

       clicksOnPost: 923081208,

       usersReached: 99302981069

 }

 ]

}

In this JSON snippet, we have some information about a Facebook post with the Dog Videos from a specific Facebook user which got a ton of likes, comments, and shares. In the post metadata, we have more key-value pairs showing that there are nested metadata. MongoDB is a bit different as each document is stored as JSON objects. This is how we can have a schema-less architecture as no two JSON objects are same. However, the actual MongoDB document is structured, but the collection doesn't care what the JSON looks like, so our MongoDB is still schemaless. The document query language allows us to dive deep into the JSON to get the data that we need out of it. If we got fired up about no complex joins, this is why. All of the data is inside of the JSON object, so you don't have to go from table to table to get the data that you want.

MongoDB Features

Here, in this part, we discuss some key features of MongoDB:

  • Ad-hoc Queries: Ad-hoc queries are the queries that are unknown while structuring the database. In this case, MongoDB offers ad-hoc query support which makes it so unique. Ad-hoc queries are updated in real-time, leading to an improvement in performance. 

  • Schema-Less Database: In MongoDB, one collection holds a different kind of documents. It is schema-less, so in the same collection, it can have many various fields, content, and size than another document. For this reason, MongoDB shows flexibility in dealing with the databases.

  • Document Oriented: MongoDB is a document-oriented database, which is a great feature. We use tables and rows for arrangements of the data in relational databases. Every row has a specific number of columns & those can store a particular type of data. Now comes NoSQL's flexibility where instead of tables and rows, there are fields. There are various documents that can store different types of data, and in MongoDB, we have collections of similar documents. Every document has a unique key Id or object Id that can be defined by either a user or a system.

  • Indexing: Indexing is very crucial for performance tuning of search queries. We should index those fields which match our search criteria in continuous document processing. We can index any field with primary and secondary indices in MongoDB.

  • Aggregation: MongoDB uses an aggregation framework for efficient usability. We can process data by batch and get a single result even after performing different operations on the group data. The aggregation pipeline, map-reduce function, and single-purpose aggregation methods are the three ways to provide an aggregation framework.

  • Replication: Replication is the method used by MongoDB when it comes to redundancy. This function distributes data to several machines. It may have primary nodes and their replica sets of one or more. If the primary node is down, the secondary node becomes primary, for instance. It saves our maintenance time and ensures smooth operations.

  • GridFS: This feature helps to store and retrieve files. This feature is very much useful for the files larger than 16 MB. GridFS divides a document into chunks and stores them in a separate document. These chunks have a default size of 255 kB except the last chunk. Once we query GridFS for a file, all the chunks are assembled as required.

  • Sharding: The sharding concept comes when it comes to dealing with massive databases. When a request comes for big data query, this will cause some problems. This functionality allows distributing these troublesome data to multiple instances of MongoDB. The MongoDB collections are distributed in several collections which have a larger size. These collections are called "shards". Shards are implemented by clusters.

  • High Performance: MongoDB is an open-source database with high performance. This shows high availability and scalability. Because of indexing and replication, it has a faster query response. This makes MongoDB a better solution for big data and real-time applications.