Spring has sprung. Even though it has felt like it here in Houston for a while now, today we usher in the official start of a new season…or so my calendar tells me. Spring marks the start of pleasant weather, sunshine, and an abundance of Peeps (marshmallows, not friends…although I suppose both are fine). For some, it also is a kickoff for getting life a bit more figured out. It is New Year’s Resolutions - Round Two. Another term for this, of course, is Spring Cleaning.
In honor of this, I thought today’s #MarcoMonday would be a great opportunity to chime in on Spring Cleaning as it applies to spatial data. Okay, don’t groan just yet. This is a necessity even if you do not specifically refer to it as such. Organizations and individuals alike should set aside time periodically to really take a look at the data on their network, get reacquainted with it, and decide if it will stick around for another season.
To better help with this Spatial Spring Cleaning journey, I have compiled a list of six things to focus on throughout this process. If you are feeling overwhelmed with the thought of tackling those unknown files on your network, this may be a sufficient jumping off point.
1. Set Goals
When you start any new regime, it is advised to have a goal or two in mind. This is no different for Spatial Spring Cleaning. You need a feat to work towards, and preferably one that is measurable. This could be any number of things and will differ from company to company, individual to individual.
Drawing a blank here and need inspiration? Okay, let’s see if we can wring out a few sample goals…
Goal #1 – Free up room on the network, decreasing used storage from 120 TB to 100 TB.
Goal #2 – Archive files that have not be used in the last 6 months.
Goal #3 – Remove all files originally authored by Tommy Boy.
Hopefully you get the idea here. These goals can either be complex or simple, dealing with the availability of data, budgeting, or data architecture. As long as you have a task or question you are working towards finishing out or answering, that will make for bright beginnings.
2. Assess the Current Inventory
After laying down what you would like to accomplish, the next logical step is discerning what it is you have to work with. You now need an inventory of that spatial data.
Inventories can include any number of details. The best inventories tell you the name, file path, file type, owner, health, and relevant dates associated with each piece of data. Well, technically the “best” inventories paint a much more vivid picture, but for basic practices, these are details you will need to know.
The Spatial Discovery tools within Integrated Marco Commander are ideal for this. Shown above within the Integrated Marco Mystic interface, these tools are designed to pull details from enterprise data, storing them within an easily navigable database. When working with this application, you have the power to specify the location to be inventoried – for instance, the C: drive rather than the W: drive – by locating a drive or drilling down to individual folders. From there, the marco container tool is run to gather up all those tricky tidbits. They are saved to a series of database tables called the Marco Database, specifically within the CONTAINER table, as shown below.
To produce a more concise view of the data at hand, the remainder of the Spatial Discovery tools may be applied. However, the results of this particular tool really give the ‘ole saying “A little goes a long way” a run for its money. With this solution, you now know what spatial data is on the network, the day it was born, and who its parents are. Background checks have nothing on this.
3. Create an Attack Plan
Now that you know what is available, the next step is to create a plan of attack. Not quite the same as setting goals, this plan illustrates the steps you will take to achieve these goals. How exactly are you going to free up space on the C: drive? Are you going to look at outdated files first or those files created by users who are no longer around?
It gives you the opportunity to break down the inventory into bite-size pieces. Like any good Spring Cleaning metaphor, you do not start with the whole house. You will get so dazed that you end up just giving up, sulking on the couch and binge-watching Shameless. Instead, you break it down room by room. In this case, “room by room” may refer to drive by drive, folder by folder, user by user, etc. However you divide it, it will be in manageable chunks that are common throughout the entire inventory. You will thank me later, pinky promise.
4. Determine Usability of Current Data
After deciding how you will scale this mountain ahead of you, one extremely handy piece of information to know about your data is whether it is usable. It seems trivial. Just like sorting through that old junk drawer in the kitchen though, we want to figure out what it is we can use now. There are several questions that may be asked of your data throughout this process. Thankfully, these questions – and more – may all be answered.
Question #1 – Is this data broken or working?
When the inventory is gathered via the Integrated Marco Commander tools, detailed related to health are also pulled. One such piece of information is its status. Although this may be viewed within the Marco Database itself, it is more easily discernible within the Integrated Marco Mystic interface. With this method, you may search the database based on brokenness, in addition to a few other filters as shown on the right. This helps you to determine if the data may be consumed right away or if it will first require a little TLC.
Question #2 – Is this data out of date?
When it comes to assessing the usability of spatial data, the age of the files is often a concern. This is for a few reasons, such as whether the information it represents is up to date, if new files representing the same data have been added to the network since its creation, or even, figuring out if a vendor or team need to be contacted to provide an updated version of the files.
Thankfully, the inventory produces several pieces of information that aide in this age-old quest. (Sorry, this was overdue for a pun…or two.) As shown below in Integrated Marco Mystic, the data created, data modified, and date of last access are all pulled at inventory. This allows you to answer those burning questions without having to root around in the metadata or pester your coworkers for a file history.
With this information, you can determine if the age of the file and its use warrants prompting an update as well as decide if any files are past their prime.
5. Create a Sorting System
Once you have determined the health of your data, now you must decide on what to do with it. This step aligns with both your overall goals and the game plan you’ve created. This means that it will differ between organizations and even clean-ups. However, this rule does apply no matter what you have set out to accomplishment...
You need to have a sorting system in place before cleaning.
This idea is simple enough, but can get overwhelming if you do not lay out the ground rules beforehand. To define this system, look back at the goals you have established for yourself. From there, decide on a handful of categories into which this data may fall depending on its condition and outcomes. For instance, four standard categories are Keep, Clean, Move, and Trash.
To dig down deeper, I also recommend creating rules for each category. This way you have a clear view of where data will fall. As an example, say my goal is to clean C:working of any files that were created longer than 6 months ago. With this goal, my rules may look something like what is outlined below.
Keep – All datasets and ArcGIS Map Documents that are not broken, contain healthy datasets, and were created within the last 6 months will remain in the same location as they were found.