CDH Usage Pattern Analysis
This project aims to learn about how Cloudera users make use of CDH (our Hadoop distribution). The idea is that given usage data we can determine what usage patterns there are for CDH by the use of a cluster-like analytical approach.
Cloudera has a great number of customers that use different components of the Hadoop world and for different use cases. Often one component is not enough to carry out a workload and a combination of components is used instead. We want to identify these combinations so that in the future we can create a more user-oriented testing methods.