Spatial Data Diagnosis: Finding the Next Big Hit on an Enterprise Network
By nature, we like simplicity. We like being able to find an answer to a question almost as soon as we think it. (Thanks, Google!) We also like for everything to have its precise place…Okay, perhaps that is just some of us.
Spatial data does not always have a defined place. I know that sounds ludicrous considering the meaning of “spatial”, but the inclusion of metadata and tags can only take us so far when finding the perfect place for these files. What data is alike? What data references the same term or location? These are great starting places when it comes to finding this home. Unfortunately, making these connections can take time and considerable effort – especially if working with many files.
What data matches x and y key terms?
This question, of course, is easily answered within Integrated Marco Mystic where you can perform searches for these terms and keywords against your data inventory. What about for those instances where you want to challenge yourself and dig a little deeper? It is fairly simple when you approach your quest from a categorization perspective. The playing ground? Integrated Marco Commander.
CATEGORIES NOT SCATEGORIES
Although this process can be run in Integrated Marco Mystic as well, the method to your madness is up to you. To answer the question posed with either application, the marco categorize tool is the key. This method utilizes a spreadsheet containing definition categories to wrangle those datasets whose metadata and details match each term.
If this spreadsheet looks intimidating, do not be alarmed. It doesn't bite. Pinky promise. Here, you define those classes and categories you wish to hunt for, matching them with any data types as well as any words/terms to include (or even exclude). You help find homes for the data on your call.
ACCESS VERSUS EXCEL
Get “cat”-d up? Sorry, horrible joke. Let me rephrase that – Did you get your categories? Results can be generated as either an Access database table or series of Excel files. Both are advantageous for their own reasons.
Personally, my favorite is the Excel format. If you are interested in sharing or reporting select results, this is typically the easiest resource to navigate. This is because it saves each class as a separate spreadsheet, allowing you to pick through each category on your own. If you are interested in further manipulating the data or visualizing it in a program like Tableau or your own Mind Palace, Access may be preferable.
TOP TEN (OR FORTY) HITS
Interested in the nitty gritty? Break open one of those bad boys to find all those datasets that matched your criteria. Details are given for the data source, type, and health. You are also privy to which terms this dataset matched – as well as a rank/scoring for how well it hit the mark.
Knowing the “how” of accomplishing this and answering the question at hand is all well and good. However, what is the “why”?
Well…Honestly, that is a question better left to you. You could have any number of reasons for breaking data out into categories. You could be on the hunt for all datasets referencing “city/town boundaries”. You could have multiple datasets with seemingly the same information and want to know which better matches your defined criteria for the dataset to persist. You could even be curious if all data matching a geographic boundary is vendor data.
You have full reign here, as well as when it comes to deciding what you are looking for. Not even Billboard has seen hits like these.
Explore the Series
When it comes to reviewing your Geographic Information System (GIS) data and inventories, there are questions you should have at the ready. Discover common questions we ask of our spatial data to get the most out of these resources - and better yet, how we actually achieve answers. Explore the cheat sheet here and dive in to each post in the series below.
Week 5 - Data Age is Not Just a Number