There’s a lot of work that goes in to making software: the code that does the thing itself, unit testing, examples, tutorials, documentation, and support. rOpenSci software is created and maintained both by our staff and by our (awesome) community. In keeping with our aim to build capacity of software users and developers, three interns from our academic home at UC Berkeley are now working with us as well. Our interns are mentored by Carl Boettiger, Scott Chamberlain, and Karthik Ram and they will receive academic credit and/or pay for their work....
randgeo generates random points and shapes in GeoJSON and WKT formats for use in examples, teaching, or statistical applications. Points and shapes are generated in the long/lat coordinate system and with appropriate spherical geometry; random points are distributed evenly across the globe, and random shapes are sized according to a maximum great-circle distance from the center of the shape. randgeo was adapted from https://github.com/tmcw/geojson-random to have a pure R implementation without any dependencies as well as appropriate geometry....
There is no problem in science quite as frustrating as other peoples' data. Whether it’s malformed spreadsheets, disorganized documents, proprietary file formats, data without metadata, or any other data scenario created by someone else, scientists have taken to Twitter to complain about it. As a political scientist who regularly encounters so-called “open data” in PDFs, this problem is particularly irritating. PDFs may have “portable” in their name, making them display consistently on various platforms, but that portability means any information contained in a PDF is irritatingly difficult to extract computationally....
This is cross-posted from Tony's blog onthelambda.com Version 2.0 of my data set validation package assertr hit CRAN just this weekend. It has some pretty great improvements over version 1. For those new to the package, what follows is a short and new introduction. For those who are already using assertr, the text below will point out the improvements. I can (and have) go on and on about the treachery of messy/bad datasets....
Everybody talks about the weather, but nobody does anything about it. - Charles Dudley Warner As a scientist who models plant diseases, I use a lot of weather data. Often this data is not available for areas of interest. Previously, I worked with the International Rice Research Institute (IRRI) and often the countries I was working with did not have weather data available or I was working on a large area covering several countries and needed a single source of data to work from....