How IΒ turned famous paintings into pie chart graphs
Every painting tells a story through color. But what does that story look like as data? I made a Python script to extract the 7 dominant colors from famous paintings using machine learning and visualized the results as pie charts.
Here's how it works, what I learned, and the full code so you can try it yourself.β
The Results

Klimt - The Kiss is 74% gold
β

Van Gogh's - Starry Night is overwhelmingly blue.
β

Delaunay's - Premier Disque is the most evenly distributed palette.
β

Matisse's Fauvist masterpiece shows why the critics were scandalized.
β

The Scream is warmer than you'd expect.
β

Kandinsky's color study is the most balanced. No single color dominates.
The Method: Why Most Color Extraction is Wrong
If you Google "extract colors from image Python," most tutorials will tell you to run K-means clustering on the RGB pixel values. This works, but it has a bias: RGB space is not perceptually uniform.
What does that mean? In RGB, the Euclidean distance between two colors doesn't correspond to how different those colors look to a human eye. Dark tones occupy a disproportionately large volume of RGB space, so K-means clusters tend to skew darker than the painting actually appears.
The fix: cluster in CIELAB color space instead
CIELAB was designed by the CIE specifically so that equal distances in the space correspond to equal perceived color differences. The L channel is perceptual lightness, A is green-to-red, and BΒ is blue-to-yellow. When you run K-means in CIELAB, the clusters represent perceptually meaningful groups of color.
Here's the pipeline:
Painting image (RGB pixels)
Β β Convert to CIELAB color space
Β β K-means clustering (7 clusters)
Β β Convert cluster centers back to RGB
Β β Map each center to nearest CSS named color
Β β Generate pie chart
Why CIELAB matters: a real example
When I first ran this on Munch's The Scream using standard RGB clustering, the warm orange sky got absorbed into nearby dark clusters, making the painting look almost entirely gray-brown.
Switching to CIELAB gave the warm tones their own clusters because CIELAB treats perceived color differences equally regardless of lightness.
Why CSS named colors?
Early versions of the script used hand-tuned color names based on HSV thresholds, things like "Dusty Amber" and "Light Azure." People rightfully pointed out that these names are meaningless. Who knows what "Dusty Amber" is?
Instead, I map each extracted color to the nearest of the 148 standard CSS named colors using Delta E distance in CIELAB. Everyone knows what "Steel Blue" or "Sienna" looks like. The naming becomes objective and reproducible, it's just "which CSS color is closest in perceptual space?"
When two segments map to the same CSS color, the script finds the next-closest unused CSS color instead, so every label in a chart is unique.
The Code
Full script below. Requires
pip install pillow numpy scikit-learn requests.Add paintings to the PAINTINGS list and run python analyze_paintings.py. It downloads each image, runs the analysis, and saves pie chart data you can plug into any charting tool.
python
import numpy as np
from PIL import Image
from sklearn.cluster import KMeans
import requests
import io
N_COLORS = 7
# ββ CIELAB conversion ββββββββββββββββββββββββββββββββββββββββββ
def rgb_to_lab_batch(rgb):
"""Convert Nx3 RGB (0-255) array to CIELAB (D65 illuminant)."""
rgb_norm = rgb / 255.0
mask = rgb_norm > 0.04045
linear = np.where(mask, ((rgb_norm + 0.055) / 1.055) ** 2.4, rgb_norm / 12.92)
r, g, b = linear[:, 0], linear[:, 1], linear[:, 2]
x = (r * 0.4124564 + g * 0.3575761 + b * 0.1804375) / 0.95047
y = (r * 0.2126729 + g * 0.7151522 + b * 0.0721750)
z = (r * 0.0193339 + g * 0.1191920 + b * 0.9503041) / 1.08883
def f(t):
return np.where(t > 0.008856, np.cbrt(t), (903.3 * t + 16) / 116)
fx, fy, fz = f(x), f(y), f(z)
return np.column_stack([116 * fy - 16, 500 * (fx - fy), 200 * (fy - fz)])
def lab_to_rgb(lab):
"""Convert Nx3 CIELAB back to RGB (0-255)."""
L, a, b_lab = lab[:, 0], lab[:, 1], lab[:, 2]
fy = (L + 16) / 116
fx, fz = a / 500 + fy, fy - b_lab / 200
def f_inv(t):
t3 = t ** 3
return np.where(t3 > 0.008856, t3, (116 * t - 16) / 903.3)
x = f_inv(fx) * 0.95047
y = f_inv(fy)
z = f_inv(fz) * 1.08883
r = x * 3.2404542 + y * -1.5371385 + z * -0.4985314
g = x * -0.9692660 + y * 1.8760108 + z * 0.0415560
b = x * 0.0556434 + y * -0.2040259 + z * 1.0572252
def gamma(c):
return np.where(c > 0.0031308, 1.055 * np.power(np.maximum(c, 0), 1/2.4) - 0.055, 12.92 * c)
return np.clip(np.column_stack([gamma(r), gamma(g), gamma(b)]) * 255, 0, 255).astype(int)
# ββ CSS named colors (148 standard) βββββββββββββββββββββββββββ
CSS_COLORS = {
"AliceBlue": (240,248,255), "AntiqueWhite": (250,235,215), "Aquamarine": (127,255,212),
"Beige": (245,245,220), "Bisque": (255,228,196), "Black": (0,0,0),
"Blue": (0,0,255), "BlueViolet": (138,43,226), "Brown": (165,42,42),
"BurlyWood": (222,184,135), "CadetBlue": (95,158,160), "Chartreuse": (127,255,0),
"Chocolate": (210,105,30), "Coral": (255,127,80), "CornflowerBlue": (100,149,237),
"Cornsilk": (255,248,220), "Crimson": (220,20,60), "Cyan": (0,255,255),
"DarkBlue": (0,0,139), "DarkCyan": (0,139,139), "DarkGoldenrod": (184,134,11),
"DarkGray": (169,169,169), "DarkGreen": (0,100,0), "DarkKhaki": (189,183,107),
"DarkMagenta": (139,0,139), "DarkOliveGreen": (85,107,47), "DarkOrange": (255,140,0),
"DarkOrchid": (153,50,204), "DarkRed": (139,0,0), "DarkSalmon": (233,150,122),
"DarkSeaGreen": (143,175,143), "DarkSlateBlue": (72,61,139),
"DarkSlateGray": (47,79,79), "DarkTurquoise": (0,206,209), "DarkViolet": (148,0,211),
"DeepPink": (255,20,147), "DeepSkyBlue": (0,191,255), "DimGray": (105,105,105),
"DodgerBlue": (30,144,255), "Firebrick": (178,34,34), "ForestGreen": (34,139,34),
"Gainsboro": (220,220,220), "Gold": (255,215,0), "Goldenrod": (218,165,32),
"Gray": (128,128,128), "Green": (0,128,0), "HotPink": (255,105,180),
"IndianRed": (205,92,92), "Indigo": (75,0,130), "Ivory": (255,255,240),
"Khaki": (240,230,140), "Lavender": (230,230,250), "LightBlue": (173,216,230),
"LightCoral": (240,128,128), "LightGray": (211,211,211), "LightGreen": (144,238,144),
"LightPink": (255,182,193), "LightSalmon": (255,160,122),
"LightSeaGreen": (32,178,170), "LightSkyBlue": (135,206,250),
"LightSlateGray": (119,136,153), "LightSteelBlue": (176,196,222),
"Linen": (250,240,230), "Magenta": (255,0,255), "Maroon": (128,0,0),
"MediumAquamarine": (102,205,170), "MediumBlue": (0,0,205),
"MediumOrchid": (186,85,211), "MediumPurple": (147,112,219),
"MediumSeaGreen": (60,179,113), "MediumSlateBlue": (123,104,238),
"MediumTurquoise": (72,209,204), "MediumVioletRed": (199,21,133),
"MidnightBlue": (25,25,112), "Moccasin": (255,228,181),
"NavajoWhite": (255,222,173), "Navy": (0,0,128), "OldLace": (253,245,230),
"Olive": (128,128,0), "OliveDrab": (107,142,35), "Orange": (255,165,0),
"OrangeRed": (255,69,0), "Orchid": (218,112,214), "PaleGoldenrod": (238,232,170),
"PaleGreen": (152,251,152), "PaleTurquoise": (175,238,238),
"PaleVioletRed": (219,112,147), "PapayaWhip": (255,239,213),
"PeachPuff": (255,218,185), "Peru": (205,133,63), "Pink": (255,192,203),
"Plum": (221,160,221), "PowderBlue": (176,224,230), "Purple": (128,0,128),
"Red": (255,0,0), "RosyBrown": (188,143,143), "RoyalBlue": (65,105,225),
"SaddleBrown": (139,69,19), "Salmon": (250,128,114), "SandyBrown": (244,164,96),
"SeaGreen": (46,139,87), "Sienna": (160,82,45), "Silver": (192,192,192),
"SkyBlue": (135,206,235), "SlateBlue": (106,90,205), "SlateGray": (112,128,144),
"Snow": (255,250,250), "SteelBlue": (70,130,180), "Tan": (210,180,140),
"Teal": (0,128,128), "Thistle": (216,191,216), "Tomato": (255,99,71),
"Turquoise": (64,224,208), "Violet": (238,130,238), "Wheat": (245,222,179),
"White": (255,255,255), "WhiteSmoke": (245,245,245), "Yellow": (255,255,0),
"YellowGreen": (154,205,50),
}
# Pre-compute CSS color positions in LAB space
_css_names = list(CSS_COLORS.keys())
_css_labs = rgb_to_lab_batch(np.array([CSS_COLORS[n] for n in _css_names], dtype=float))
def nearest_css_name(rgb):
"""Find the closest CSS color name using Delta E in CIELAB."""
lab = rgb_to_lab_batch(np.array([rgb], dtype=float))[0]
distances = np.sqrt(np.sum((_css_labs - lab) ** 2, axis=1))
name = _css_names[np.argmin(distances)]
# "DarkSlateGray" -> "Dark Slate Gray"
return ''.join(f' {c}' if c.isupper() and i > 0 else c for i, c in enumerate(name)).strip()
# ββ Main analysis βββββββββββββββββββββββββββββββββββββββββββββ
def analyze_painting(image_path_or_url, n_colors=N_COLORS):
"""Extract dominant colors from a painting image.
Args:
image_path_or_url: Local file path or URL to the painting image.
n_colors: Number of color clusters (default 7).
Returns:
List of dicts with keys: name, hex, rgb, percentage
"""
if image_path_or_url.startswith("http"):
headers = {"User-Agent": "ColorAnalysis/1.0"}
response = requests.get(image_path_or_url, headers=headers, timeout=30)
response.raise_for_status()
img = Image.open(io.BytesIO(response.content)).convert("RGB")
else:
img = Image.open(image_path_or_url).convert("RGB")
# Resize for performance (800px preserves enough detail)
img.thumbnail((800, 800))
pixels_rgb = np.array(img).reshape(-1, 3).astype(float)
# Cluster in CIELAB for perceptual uniformity
pixels_lab = rgb_to_lab_batch(pixels_rgb)
kmeans = KMeans(n_clusters=n_colors, random_state=42, n_init=10)
kmeans.fit(pixels_lab)
labels, counts = np.unique(kmeans.labels_, return_counts=True)
total = counts.sum()
centers_rgb = lab_to_rgb(kmeans.cluster_centers_)
# Build results, sorted by dominance
colors = []
used_names = set()
for i in np.argsort(-counts):
rgb = tuple(centers_rgb[i])
pct = round(counts[i] / total * 100, 1)
# Find unique CSS color name
lab = rgb_to_lab_batch(np.array([rgb], dtype=float))[0]
distances = np.sqrt(np.sum((_css_labs - lab) ** 2, axis=1))
for idx in np.argsort(distances):
name = _css_names[idx]
spaced = ''.join(f' {c}' if c.isupper() and j > 0 else c for j, c in enumerate(name)).strip()
if spaced not in used_names:
used_names.add(spaced)
break
colors.append({
"name": spaced,
"hex": "#{:02x}{:02x}{:02x}".format(*rgb),
"rgb": rgb,
"percentage": pct,
})
return colors
# ββ Example usage βββββββββββββββββββββββββββββββββββββββββββββ
if __name__ == "__main__":
PAINTINGS = [
("The Kiss", "Gustav Klimt",
"https://upload.wikimedia.org/wikipedia/commons/4/40/The_Kiss_-_Gustav_Klimt_-_Google_Cultural_Institute.jpg"),
("The Scream", "Edvard Munch",
"https://upload.wikimedia.org/wikipedia/commons/c/c5/Edvard_Munch%2C_1893%2C_The_Scream%2C_oil%2C_tempera_and_pastel_on_cardboard%2C_91_x_73_cm%2C_National_Gallery_of_Norway.jpg"),
("Squares with Concentric Circles", "Wassily Kandinsky",
"https://upload.wikimedia.org/wikipedia/commons/9/98/Vassily_Kandinsky%2C_1913_-_Color_Study%2C_Squares_with_Concentric_Circles.jpg"),
]
for name, artist, url in PAINTINGS:
print(f"\n{'='*60}")
print(f"{name} β {artist}")
print(f"{'='*60}")
colors = analyze_painting(url)
for c in colors:
print(f" {c['percentage']:5.1f}% {c['hex']} {c['name']}")Running it
bash
pip install pillow numpy scikit-learn requests
python analyze_paintings.py
```
Output:
```
============================================================
The Kiss β Gustav Klimt
============================================================
28.3% #7d693b Dark Olive Green
24.3% #998141 Dark Khaki
21.3% #c2a84a Dark Goldenrod
10.6% #a99f82 Tan
8.0% #779352 Olive Drab
4.0% #7f6f9f Steel Blue
3.5% #3e362b Dark Slate Gray
β
Make a pie chart
To generate the actual pie charts, I plugged the data into Chartty, a pie chart maker, and exported them as JPEGs.
.png)