Home » Project 2#: cluster analysis | business management | Harvard University

# Project 2#: cluster analysis | business management | Harvard University

For this project, we’re going to use cluster analysis to “tell a story” about our data. I’m asking you to divide the Oregonians in your sample into groups or clusters based on two quantitative variables. Your “story” will be an explanation of your data that highlights some interesting feature(s) or makes a point about the data.

Please note this project will likely take some trial and error. Please relax into it and have some fun with the process: think of it as an exploration. Trial and error is the spice of life!

You will begin by opening a small data set. If you don’t have a data set of your own you’d like to explore, I recommend taking a random subset of the OregonPUMS_data. You will take a small subset of this data (I recommend n=400, so as not to upset XLSTAT too much: some clustering algorithms grind to a halt with large data sets).

Note that “Weight,” just like last time in project 1, is probably not what you think it is! I’m not forbidding “weight” but realize you’ll have to do a bunch of research about what “weight” is in order to write about it well and get credit!

Process:

Step 1: Select your sample of n=400. Lucky for us, XLSTAT is quite good at taking a random sample. Check out: Simple Random Sampling in XLSTAT ; alternatively there are several tutorials online for taking a random sample with regular excel.

Step 2: Choose two QUANTITATIVE variables that you would like to work with. Copy and paste your two variables and their corresponding sampled data (there should be 400 rows of data, two columns) into a new sheet. I prefer to do this so that I am not overwhelmed by variables that I am not using. Next, remove any rows with missing observations. This will save time later when you go to plot your clusters.

For the following steps, be sure that you have installed the XLSTAT add-in. Click on the XLSTAT tab on the top of your Excel sheet.

Step 3: Use different options in the software to create 5 different “data stories”: if you’re overwhelmed about what to pick, you can use these options:

*Scatterplots will have to be created separately using the Results by Object output. Under the colors tab, use whatever colors you would like, but be sure they are bold and distinct. For example, it would be a bad idea to use white or both red and red-orange.

Step 4: Write up your project! Which clustering method out of the five did you prefer? Why?

For your final report, compare and contrast each of the five clustering methods. You may choose to use your XLSTAT output or use Tableau/other software to make a prettier graph. Tell your story using your preferred clustering method, and how the clustering supports that story. Who are these groups? What does this clustering tell us about the people in Oregon? How could a business or entrepreneur use their understanding of this clustering story to further their goals?

Rubric for Project (40 points)

Maximum 5 points total if your variables are not quantitative! You must have two quantitative variables!

15 points: at least 5 different scatter plot graphs, all using the same basic variables (Step 1) but different clustering choices (Step 3). Data process and data product both discussed, particularly for Method 5. Clusters must vary, and at least one method shouldn’t be “just stripes,” e.g. both variables should matter.

10 points: your narration of the progression of your thinking (data process story).

5 points: Instructor’s subjective take on the product story. Was it gripping, interesting, well done?

5 points: graph conventions, labels, etc.

5 points: conventions: correct punctuation, sentences, etc.

Data Note: Be careful about the “Person’s Weight” variable. This does not mean “how much this person weighs” it means “how much weight to assign this person’s answers.” If you’re curious (not required) you can read about statistical weighting here: http://www.applied-survey-methods.com/weight.html (Links to an external site.).

https://help.xlstat.com/s/article/agglomerative-hi…

https://help.xlstat.com/s/article/k-means-clustering-in-excel-tutorial?language=en_US (Links to an external site.)

https://help.xlstat.com/s/article/scatter-plot-with-confidence-ellipses-in-excel?language=en_US

Requirements: around 3 pages   |   .doc file

## Calculate the price of your order

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
\$26
The price is based on these factors:
Number of pages
Urgency
Basic features
• Free title page and bibliography
• Unlimited revisions
• Plagiarism-free guarantee
• Money-back guarantee
On-demand options
• Writer’s samples
• Part-by-part delivery
• Overnight delivery
• Copies of used sources
Paper format
• 275 words per page
• 12 pt Arial/Times New Roman
• Double line spacing
• Any citation style (APA, MLA, Chicago/Turabian, Harvard)

# Our guarantees

Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.

### Money-back guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

### Zero-plagiarism guarantee

Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

### Free-revision policy

Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.