If you've ever wondered how the backup works while it's transferring your data to the cloud, this post explains what happens between your machine and the back end as your files are safely and securely uploaded to the cloud.
Mozy authenticates to the back end. This is done using an OAuth token, which itself would have been issued either by way of a SAML transaction for SSO users or for a credential check for non-SSO customers
Step 2: Retrieving Configuration
Mozy issues a request to retrieve the latest configuration. Prior to sending the configuration, the backend supplies a cryptographic hash of the current configuration. If it's identical to what the current client has (meaning it hasn't changed since the last time), the download is skipped. If it's different, the config is downloaded.
Step 3:Retrieving Object Catalog
Once the configuration is downloaded, the same thing is done for the object catalog (request to download and cryptographic hash). If the hash is different (meaning the catalogs are not congruent, as may occasionally occur if a backup is terminated unexpectedly), then the catalog is downloaded and a comparison run to identify which files are impacted. This is also known as an 'integrity check.'
Step 4: Identifying Files to be Backed Up
We identify the files that need to be backed up, either new files (ie, if a new backup set was created), or changed files
Step 5: Encrypting Files
For new files, the content is buffered into memory, then encrypted using the customer's choice of encryption key. Prior to transmitting the file to the back end, a cryptographic hash is sent to the back end to determine if we already have an instance of the file. If "yes," a pointer is added to the customer's object catalog. If no, the back end requests the file. This is known as "Single Instance Storage"
Step 6: Transmitting New Files
The file is inserted into the pipeline and transmitted asynchronously. When the back end has received and stored it, it ensures that the content's cryptographic hash and final object length match what was initially sent prior to transmitting. Provided they match, the back end sends an acknowledgement to the client.
Note: Steps 5 and 6 can be done in parallel with multiple files. By default, up to 500 objects can be in the pipeline in order to prevent synchronous delays
Step 7: Transmitting Changed Files
For changed files, steps 5 and 6 are identical with a couple exceptions. Rather than reading out the whole file, we seek to the location of the changes in the file and read only the changes out. This differentiates itself from traditional change block tracking by requiring that we seek only to where changes occur and that we're not constrained by fixed block sizes. This is known as "vector tracking," and is accomplished by having a file system filter driver that records where changes occur on the disk.
Step 8: Finishing
Once all changes have been sent, a log of the backup is sent to the back end.
So there you have it, how the backup process works. It's a little technical, but provides some great insight for those of you who really want to know what's happening when you click "Start Backup" and tell it to start doing it's thing.
Have any questions or concerns? Want to recommend something for a future blog post topic? Let us know in the comments!