March 10, 2017 From rOpenSci (https://deploy-preview-334--ropensci.netlify.app/blog/2017/03/10/mongolite/). Except where otherwise noted, content on this site is licensed under the CC-BY license.
After 2.5 years of development, version 1.0 of the mongolite package has been released to CRAN. The package is now stable, well documented, and will soon be submitted for peer review to be onboarded in the rOpenSci suite.
I started working on mongolite in September 2014, and it was first announced at the rOpenSci unconf 2015. At this time, there were already two Mongo clients on CRAN: rmongodb (no longer works) and RMongo (depends on Java). However I found both of them pretty clunky, and the MongoDB folks had just released 1.0 of their new C driver, so I decided to write a new client from scratch.
Mongolite aims to provide a simple R client for MongoDB, based on the excellent mongo-c-driver combined with super-powers from the jsonlite package. Simple means insert and query data in R using data-frames with a single command:
# Create a connection
con <- mongolite::mongo("diamonds",
url = "mongodb://readwrite:test@ds043942.mongolab.com:43942/jeroen_test")
# Find diamonds with: cut == Premium & price < 500
mydata <- con$find('{"cut" : "Premium", "price" : { "$lt" : 500 } }')
print(mydata)
Running your own MongoDB server is easy. Either download it from the website or install it with your favorite package manager. To start the server simply run the mongod
command (d for daemon) in a shell:
# Install mongoDB server
brew install mongodb
# Run the server dameon
mongod
The mongolite::mongo()
function wil default to the localhost server if no URI is specified. Try inserting and reading some data:
# Create a connection
con <- mongolite::mongo("iris")
# Insert some data
con$insert(datasets::iris)
# Count how much data is in the DB
con$count()
# Read the data back
con$find('{}')
# Wipe the collection
con$drop()
In my experience, a simple interface is critical to get started. Obviously, advanced features are available in mongolite as well, but this will get you up to speed right way if you just need the data and get on with your job.
The 1.0 release has fresh documentation based on the awesome bookdown system. You can find documentation on the mongolite github homepage.
The bookdown is now the primary documentation source for mongolite.
MongoDB is the most popular nosql database (by market share), and the 5th most popular database allround. Mongo is relatively young in comparison with the traditional engines (Oracle, Microsoft, MySQL, Postgres), yet well established, fully open source, and backed by a professional company.
MongoDB provides a modern high-performance DB engine with cool features that cannot be found anywhere else. The high quality client drivers are a pleasure to work with, and actively maintained by professional engineers. Writing bindings, it quickly became obvious that Mongo does not suffer from the legacy bloat that I have come to associate with traditional DB engines.
At the same time the ecosystem is mature and offers reliability and continuity that makes it stand out from the proliferation of nosql systems. MongoDB has been widely adopted by users and distributions, so I am pretty confident it will still be around 5 or 10 years from now.
The NEWS file on Github lists what has changed in this release:
allow_invalid_hostname
parameter to ssl_options()
bigint_as_char
to parse int64 into string instead of doublemongo_options()
to get/set global optionsmongo_log_level()
is removed (use mongo_options()
instead)insert()
now substitutes dots in key (column) names with underscoresupdate()
, support for upsertAt rOpenSci we’re interested to hear who is using R and how. If you decide to give mongolite a try, please share your experiences and suggestions. Open an issue on github for specific problems and feature requests, or join the rOpenSci slack to talk about mongolite or other rOpenSci packages. Or just come say hi and hang out :)
We both love hearing from academic users, but also industry applications of R and synergy between the industry and open source scientific software.