EDA: One Piece Character Data Exploration using Regression and Clustering Analysis

Carlos Abiera
3 min readAug 14, 2020

One piece is one of the longest running anime in Japan and I’m one of those who are hooked in their poetic literature. In celebration of their 20th anniversary this year (2019), I explored the characters from this story using data from claystage.com.

I got features like Character name, height, devil fruit profile and bounty.

Data Exploration

Data Studio is one of my favourite visualisation tool. The simplicity of its interface and its sophisticated queries and filtering helps me perform fast. It also gives the user a more freedom to design / layout a dashboard or report like a pro.

Download pdf file to view the entire report.

Jupyter Notebook is my another workspace where I can perform simple to advance statistical tools. This is how my dataset looks like after combining different data sources.

From the correlation matrix below, it shows that taller characters have strong positive relationship with bounty.

The features highlighted above shows the transformed dataset ready for regression analysis later.

Bounty and Devil fruit K-Means Clustering

  • Cluster 2 is Big Mom and White Beard.
  • The undetermined devil fruit are Laffitte and Tamago
  • The failed SMILE is Killer
  • Majority of the characters are paramecia devil fruit eater

Bounty and Age K-Means

  • Cluster 1 are the pirates belong to the new generation
  • Cluster 2 are Big Mom and White Beard.
  • Cluster 3 are the other Yonko’s and the candidate
  • The oldest was Brook

Multilinear Regression

My goal is to find the best feature that can explain and predict the bounty of the pirates. Categorical features are transformed.

Using Backward elimination:

Final Result: The best features are Height in foot and How devil fruit is acquired ( Consumed / Re-consumed)

Visit me here to see the entire process

Originally published at https://www.linkedin.com.

--

--

Carlos Abiera

Carlos C. Abiera currently manages the operations of Montani Int. Inc. and leads the REV365 data team. He has keen interests in data and behavioral sciences.