MongoDB Quick Start Guide 🍃⚡️

mongodb-quick-start-guide-️
  • Types of Databases
  • Local Setup
  • Define your Data Model

    • One-to-One
    • One-to-Few
  • Common Operations using Python

    • Insert a Document
    • List Documents
    • Update a Document
    • Delete a Document
  • Indexes

    • Single Field Index
    • Unique Index
    • TTL Index
    • Compound Index
    • Get Collection Indexes

Types of Databases

There are two main types of databases that can be used in applications and these include NoSQL databases and SQL databases. SQL databases are relational and structured and usually consist of tables of rows and columns. NoSQL databases come in the following types Key-Value, Wide-Column, Graph, and Document but we will mainly be taking a look at document databases specifically MongoDB, which store data in documents that are grouped together in collections. SQL databases are built to vertically scale which means they are ran on a single server and to handle more load you need to increase the CPU and Memory on that server. On the other hand, NoSQL databases are built to horizontally scale and are typically deployed with multiple servers in a replica-set, so handling more load is as simple as adding a new server. Databases are evaluated on their ACID Compliance (Atomicity, Consistency, Isolation, Durability) which is a standard set of properties that guarantee database transactions are processed reliably. Most SQL Databases are ACID compliant by default where NoSQL databases can sacrifice ACID compliance for performance and scalability. Although most of the NoSQL databases offer a solution to achieve ACID compliance if it is needed.

NoSQLvSQL

https://starship-knowledge.com/when-to-choose-nosql-over-sql

Local Setup

MongoDB is an open source NoSQL Document database which horizontally scales utilizing clustered servers in replica-sets. A great option for testing and performing local development with MongoDB is using Docker and Docker Compose. The following docker compose spec will create a local docker container running MongoDB with the data persisted to a docker volume.

📝 docker-compose.yml

services:
  db:
    image: mongo:7.0.1
    container_name: myDataBase
    restart: always
    ports:
      - 27017:27017
    environment:
      MONGO_INITDB_ROOT_USERNAME: root
      MONGO_INITDB_ROOT_PASSWORD: mySecureDbPassword1!
    volumes:
      - type: volume
        source: my_db_data
        target: /data/db

volumes:
  my_db_data:

💡 Refer to Containers Demystified 🐳🤔 for a full Docker container guide

Start the container as a detached background process.

$ docker-compose up -d
[+] Building 0.0s (0/0)                                                                                   docker-container:unruffled_shockley
[+] Running 3/3
 ✔ Network mongo_default      Created                                                                                       
 ✔ Volume "mongo_my_db_data"  Created                                                                                                   
 ✔ Container myDataBase       Started                                                                                                   

$ docker ps
CONTAINER ID   IMAGE         COMMAND                  CREATED         STATUS         PORTS                      NAMES
11bce7e4ab08   mongo:7.0.1   "docker-entrypoint.s…"   2 minutes ago   Up 2 minutes   0.0.0.0:27017->27017/tcp   myDataBase

💡 Mongo uses Collections with BSON Documents inserted into Collections and you can loosely think of this as multiple lists of Dictionaries or JSON Objects. The difference with BSON and JSON is BSON supports additional complex types such DateTime Objects.

Connect to the container with the mongo shell utility mongosh then create our first database (myAppDb), collection (todos) and insert a document.

$ mongosh 'mongodb://root:mySecureDbPassword1!@localhost:27017/'
Current Mongosh Log ID: 651abecd3a009f9bf2b8999c
Connecting to:      mongodb://@localhost:27017/?directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+2.0.1
Using MongoDB:      7.0.1
Using Mongosh:      2.0.1

test> use myAppDb
switched to db myAppDb

myAppDb> db.todos.insertOne({"title": "Do a thing", "completed": false, "created_date": new Date()})
{
  acknowledged: true,
  insertedId: ObjectId("652009addda6ead56a87d0e5")
}

💡 mongosh is written with JavaScript (NodeJS) so we can insert a DateTime object with new Date()

We can verify the document has been created by listing out the documents in the collection with the find command.

myAppDb> db.todos.find()
[
  {
    _id: ObjectId("652009addda6ead56a87d0e5"),
    title: 'Do a thing',
    completed: false,
    created_date: ISODate("2023-10-06T13:20:45.434Z")
  }
]

If you prefer to use a visual tool with a GUI instead of the command line, MongoDB Compass can be used.

MongoCompass

Define your Data Model

Mongo is great because it allows for flexibility with the data you can store in a collection, all of the documents do not need to have all of the same fields. That being said, it is still important to start with defining your data model when you create a new app or add a new feature to make sure you do not have complexity, organization or performance issues later on. The two types of relationship schemas that I use the most are One-to-One and One-to-Few.

💡 Great article on MongoDB Schema Best Practices which covers more schema designs

One-to-One

In the One-to-One relationship each attribute has a single value so the resulting document is flat and we have one document per item. This is the schema of the document we just inserted.

{
    "_id": ObjectId(),
    "title": "string",
    "completed": false,
    "created_date": ISODate()
}

One-to-Few

In the One-to-Few relationship we use a list of nested objects to group all associated items into a single document such as a user record with multiple addresses.

💡 MongoDB has a 16MB max document size

{
    "_id": ObjectId('A'),
    "fist_name": "John",
    "last_name": "Doe",
    "company": "Cisco",
    "addresses": [
        { "street": "7200-11 Kit Creek Rd", "city": "Durham", "state": "NC", "country": "US"},
        { "street": "7200-12 Kit Creek Rd", "city": "Durham", "state": "NC", "country": "US"},
    ]
}

Common Operations using Python

This full example is available on GitHub
https://github.com/dpills/mongo-todos/tree/master

We have used the mongo shell and mongo compass but these tools are usually only used for manual operations or validation of data. MongoDB offers drivers (libraries) in many different programming languages but we will be taking a look at common operations using the Python PyMongo driver for a more realistic application example.

💡 The majority of code is written to interact with the collection of the database and the full list of features can be referenced at the pymongo Collection level operations documentation.

Make sure to install pymongo with a Python package manager to get started.

# Pip
$ python3 -m pip install pymongo
Installing collected packages: dnspython, pymongo
Successfully installed dnspython-2.4.2 pymongo-4.5.0

# Poetry
$ poetry add pymongo
The following packages are already present in the pyproject.toml and will be skipped:

  • pymongo

Create a python file, import pymongo and setup the mongo connection, specifying to use the myAppDb database.

📝 todos.py

import argparse
from datetime import datetime

import pymongo
from bson import ObjectId
from bson.errors import InvalidId

db_client = pymongo.MongoClient("mongodb://root:mySecureDbPassword1!@localhost:27017/")
db = db_client["myAppDb"]

🛑 In a real application make sure to put the URI in an environment variable so it is not exposed in your source code

Insert a Document

Insert the document with the same schema we used earlier using insert_one which includes adding a DateTime object.

📝 todos.py

def create_todo(title: str) -> None:
    """
    Create a todo
    """
    create_result = db.todos.insert_one(
        {"title": title, "completed": False, "created_date": datetime.utcnow()}
    )
    print(f"New Todo ID: {create_result.inserted_id}")

    return None

We can get the Document ID by printing the inserted_id of the create result.

💡 MongoDB generates an ObjectId for each document

ObjectIds are small, likely unique, fast to generate, and ordered. ObjectId values are 12 bytes in length, consisting of:

  • A 4-byte timestamp, representing the ObjectId’s creation, measured in seconds since the Unix epoch.
  • A 5-byte random value generated once per process. This random value is unique to the machine and process.
  • A 3-byte incrementing counter, initialized to a random value.
$ python3 todos.py -o create -d 'Write a todo app'
New Todo ID: 65200e33a29c1e7244f7df59

List Documents

We can now loop the existing documents with the find operation and provide a filter so that only the items which are not completed are returned. If we wanted to return all of the items event the completed ones we could just remove the filter db.todos.find() .

💡If you need to find only a single document then find_one can be used

📝 todos.py

def get_todos() -> None:
    """
    Get Todos
    """
    print()
    for document in db.todos.find({"completed": False}):
        for key, value in document.items():
            print(f"{key}: {value}")
        print()

    return None

We see both of our To Do items printed out.

$ python3 todos.py -o read

_id: 652009addda6ead56a87d0e5
title: Do a thing
completed: False
created_date: 2023-10-06 13:20:45.434000

_id: 65200e33a29c1e7244f7df59
title: Write a todo app
completed: False
created_date: 2023-10-06 13:40:03.944000

Update a Document

To update a document update_one can be used, we need to pass a filter as the first argument to say which document to modify, the ObjectId of the todo is used in this case. The second argument is the operation we want to run, I tend to use $set the most which modifies the field value. In this example we can just modify the document to complete the todo item by setting completed to true.

📝 todos.py

def complete_todo(todo_id: str) -> None:
    """
    Complete a todo
    """
    try:
        todo_object_id = ObjectId(todo_id)
    except InvalidId:
        print("Invalid Todo Id")
        return None

    update_result = db.todos.update_one(
        {"_id": todo_object_id}, {"$set": {"completed": True}}
    )

    if update_result.matched_count == 0:
        print("Todo not found")
    else:
        print("Completed todo, nice work! 🎉")

    return None

Complete our initial todo item and verify that we no longer see it when listing out our unfinished todo items.

$ python3 todos.py -o complete -d 652009addda6ead56a87d0e5
Completed todo, nice work! 🎉

$ python3 todos.py -o read

_id: 65200e33a29c1e7244f7df59
title: Write a todo app
completed: False
created_date: 2023-10-06 13:40:03.944000

Delete a Document

Deleting a document is a similar syntax to finding a document where a filter is provided to delete_one to indicate which document to delete.

📝 todos.py

def delete_todo(todo_id: str) -> None:
    """
    Delete a todo
    """
    try:
        todo_object_id = ObjectId(todo_id)
    except InvalidId:
        print("Invalid Todo Id")
        return None

    delete_result = db.todos.delete_one({"_id": todo_object_id})

    if delete_result.deleted_count == 0:
        print("Todo not found")
    else:
        print("Deleted the todo")

    return None

Delete the todo we created and we can see that nothing gets listed when running our read command.

$ python3 todos.py -o delete -d 65200e33a29c1e7244f7df59
Deleted the todo

$ python3 todos.py -o read

Checking from the Mongo shell shows that that document was deleted and we can see that our initial document still exists but has just been marked as completed.

myAppDb> db.todos.find()
[
  {
    _id: ObjectId("652009addda6ead56a87d0e5"),
    title: 'Do a thing',
    completed: true,
    created_date: ISODate("2023-10-06T13:20:45.434Z")
  }
]

Indexes

Indexes are common across most databases and without them database queries can end up scanning every item in a collection in order to find the document or documents required. Indexes are special data structures that store a small portion of the collection’s data set in an easy to traverse form. Indexes should be created for fields that are matched in the filters of commands. These are the index types that I use the most but a full list can be reference in the Index Types documentation.

💡 1 = Ascending, -1 = Descending

Single Field Index

Single field index can be used when you need to match a single field. We can apply this one to our todo example since we match on the completed field.

myAppDb> db.todos.createIndex({completed: 1})
completed_1

Unique Index

A unique index ensures that the indexed fields do not store duplicate values; i.e. enforces uniqueness for the indexed fields. By default, MongoDB creates a unique index on the _id field during the creation of a collection.

> db.collection.createIndex( { "unique_id": 1 }, { unique: true } )

TTL Index

TTL indexes are special single-field indexes that MongoDB can use to automatically remove documents from a collection after a certain amount of time. These work on fields that have Date or in Python datetime object types. This is useful for large amounts of data that may only need to be retained for a short or determined period of time.

> db.collection.createIndex( { "created_date": 1 }, { expireAfterSeconds: 3600 } )

Compound Index

Compound indexes can be used when you need to match on multiple fields.

> db.collection.createIndex( { owner: 1, created_date: 1 } )

Get Collection Indexes

Collection Indexes can be viewed with the getIndexes method.

myAppDb> db.todos.getIndexes()
[
  { v: 2, key: { _id: 1 }, name: '_id_' },
  { v: 2, key: { completed: 1 }, name: 'completed_1' }
]
Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post
[ml-story]-computer-vision-made-easy-with-google-cloud-vision-api

[ML Story] Computer Vision made easy with Google Cloud Vision API

Next Post
from-service-provider-to-strategic-partner:-level-up-your-pmm-game

From service provider to strategic partner: Level up your PMM game

Related Posts