Data is often touted as the key to solving big problems. But data is only useful if it validates a hypothesis which and support steps towards a viable solution. The key to this three stage paradigm is converting data into information that is useful to decision makers. Conversion means analysis. Since data analysis, rather than data alone, is the real key to solving big problems, what does it mean in terms of maintaining a clean environment? The process begins with collecting raw data, but then what?
“Hiding within these mounds of data is knowledge that could change the world!”
- A Butte.
Data analytics, in its most basic form, is pattern recognition within data sets. There are a variety of ways to look for patterns and trends in data. One way is to specify criteria, such as search queries (i.e actively searching for something) or filters (deciding which data you want to exclude from your search). Patterns can also be observed by rearranging and sorting data, for example by date ranges or value ranges. Queries, filters, data sorting and data arrangement can all generally be classified as “data processing”. This so-called “processing” of data is the structuring of if, often times in spreadsheets and tables which then facilitates more advanced data analysis.
One example of how structured data facilitates advanced analysis and thus sound decision-making is when data is “clustered” and evaluated against similar attributes and along multiple axes (i.e. variables). For example, if weather forecasts call for an amount of rain above a certain threshold in a particular watershed where certain BMPs are already in place, an environmental compliance professional be made aware of the most likely compliance measures to fail and pay more attention to them on his pre-rain site inspection.
Advanced data analytics include data modeling and statistical analysis such as linear regressions and predicted response calculations. A linear regression attempts to determine the extent to which an independent variable is affected by a dependant variable. In the context of environmental compliance, an example of this might be the extent to which disturbed acreage on level land has an impact on local water quality.
Statistical analysis, such as predicted response and mean response regressions, provide “confidence intervals” which can help environmental compliance professionals determine the likelihood of of an illicit discharge. Data correlation, when data meets two or more criteria, can help inspectors determine what action to take on inspections site: if the impact is negative, how can the impact be mitigated; if the impact is positive, how can it be enhanced? Hence, data correlation based on patterns and trends facilitates sound conclusions.
Data collection is not enough to maintain a healthy environment. It is just the first step towards a cleaner environment. CloudCompli is currently developing a modeling and statistical analysis capability to support smarter compliance.