Creative Data Science Applications

In many cases, data preparation becomes the most important and time-consuming phase of the work. Data-munging can sometimes take up to 70-80% of project resources and can be quite challenging depending on the source and original quality. This was the particular case in this situation for an organization that worked in a category known for disorganized and unstructured - yet tracked - data. Third party syndicated sources collected the information but did little to aggregate and report it on a regular basis due to a small potential client base. My role was to enable this information to be used to track market and consumer purchase trends, providing the organization a proprietary competitive advantage.

The primary tool used to analyze, segment, and organize the tens of thousands of product items contained in the category was regular expressions. They are a form of text, or natural language processing, analytics. Automating the identification and organization of certain patterns that represented forms, flavors, sizes, packaging, etc., I was able to work with the organization to develop a proprietary structure for the category based on product features. This new and innovative structure enabled the ongoing tracking of sales and other trends that, in the past, were ignored and thought to be too difficult to organize and quantify.