TechStar Asia - Tech News for Builders and Operators

AUTHOR: Rahul Atram
INSTITUTION: SGGS Institute of Engineering & Technology, Nanded
DATE: 19 April 2026

THE MOMENT OF DISCOVERY

It was a quiet Sunday evening in Nanded, the kind of evening where the air is still and the mind begins to wander. I had just returned from a short trip, unpacking my bags and preparing for the week ahead, when a notification flashed on my screen: the Ex-Machina Hackathon was officially live. The challenge was unique—classify thousands of years of human art history using nothing but digital metadata.

As a third-year Computer Science student, I have always believed that data is not just a collection of numbers; it is a story waiting to be told. However, looking at the spreadsheet of 5,000 messy museum records, it felt more like a riddle than a story. There were missing years, inconsistent measurements, and cryptic notes. But that is exactly where the journey began. What started as a late-night curiosity turned into a high-performance machine learning pipeline that achieved a verified 94.10% accuracy. This is the story of that discovery.

DECODING THE PROBLEM: ART BEYOND THE IMAGE

The competition was a test of "Metadata Intelligence." We were given 4,000 artworks and asked to predict their medium—the physical substance they were made of. This wasn't about looking at photos of paintings; it was about understanding the words used to describe them.

We were dealing with eight beautiful, distinct categories:

Acrylic: The modern, vibrant medium of the 20th century.
Ink: The sharp, decisive lines of sketches and calligraphy.
Oil on Canvas: The heavy, textured gold standard of the masters.
Oil on Wood and Panel: The rigid, durable ancestors of modern painting.
Print: The art of reproduction, etching, and woodblocks.
Tempera: The ancient egg-yolk based paint used in historical icons.
Watercolor: The translucent, flowing beauty of landscapes.

The challenge wasn't just to get a high score. It was to build a system that truly understood the nuances of art cataloging.

THE "AHA!" MOMENT: LISTENING TO THE DATA

Most people start a machine learning project by immediately writing code. I decided to start by reading. I spent the first hour simply scrolling through the rows of the data, and that is when I found my "Smoking Gun."

I noticed a column called 'Caption'. While other columns were missing or fragmented, the curators at these museums were incredibly consistent in their captions. They would write things like: "A watercolor landscape titled 'The River'..."

This was my breakthrough. I realized that the answer wasn't hidden; it was written in plain English right in front of us. The machine didn't need to guess; it just needed to learn how to read. This "Caption Signal" became the heart of my entire strategy.

THE JOURNEY THROUGH THE EXPERIMENTS

In my quest for accuracy, I followed a path of increasing intelligence. I didn't want to build a "black box"; I wanted to understand the progression of my model's brain.

Step 1: The Keyword Baseline
I started with a technique called TF-IDF. Think of this as a very fast librarian who scans a book for important keywords. If the librarian sees the word "canvas," they guess "Oil." This simple approach got me to a solid 91% accuracy. It was a great start, but art is more than just keywords.

Step 2: The Margin Carver
I then upgraded to a Linear Support Vector Machine. This is a bit like a judge who tries to draw the absolute sharpest line between two different piles of evidence. It brought our accuracy up to 94.20%. We were getting closer to the truth.

Step 3: The Champion Model — SBERT + CatBoost
Then came the real revolution. I introduced a "Transformer" model called Sentence-BERT. Unlike my earlier librarian who only counted words, Sentence-BERT actually "reads" the sentence. It understands context. It knows that "pigment on fabric" is the same as "painting on canvas" even if the words are different.

I combined this "reading brain" with CatBoost—a gradient boosting model that acted as the "historical memory." CatBoost looked at the years (y0/y1) and the size of the artwork (area) and combined them with the text. This hybrid approach allowed us to hit a verified cross-validation peak of 94.10%, with initial probe estimates reaching even higher.

REALITY CHECK: HONESTY IN DATA SCIENCE

As a student, it is tempting to chase a 100% score. But this hackathon taught me a valuable lesson in professional honesty. While my initial probes hit 99% accuracy because of the heavy "Caption Signal," I realized that a truly useful model must be robust.

In my final version, I focused on proper feature integration and handling missing values. I realized that 94.10% is not just a number; it represents a model that is balanced, realistic, and ready for the real world. This intellectual maturity—knowing that data is never perfect—is perhaps the most important thing I learned throughout this experience.

THREE LIFE-CHANGING LESSONS

Observation is more powerful than Algorithms: The "Caption Signal" was discovered because I spent time looking at the raw data, not just the code. Always look at your data first.
Baselines are the ground you stand on: Never start with a complex neural network. Start small to understand the "floor" of your performance.
Art and Science are not enemies: Using AI to understand human creativity was a beautiful experience. Machine learning is simply a new way to appreciate the precision of those who have cataloged human history for centuries.

CONCLUSION: THE FUTURE OF THE MUSEUM

What I built in the Ex-Machina hackathon was more than a classifier. It was a bridge between the historical archives of the past and the intelligent systems of the future. I learned that while a machine doesn't have an "eye" for art, it definitely has an "ear" for the language we use to describe it.

To my fellow students at SGGS and beyond: don't be afraid of the complexity. AI is just a tool, and your curiosity is the power that makes it work. Let’s keep building, keep questioning, and keep telling the stories hidden inside the data.

My journey continues at: https://www.kaggle.com/rahulatram
Let's build the future together.

UNMASKING THE MASTERPIECE: How I Leveraged Semantic AI to Decode 5,000 Years of Art History

THE MOMENT OF DISCOVERY

DECODING THE PROBLEM: ART BEYOND THE IMAGE

THE "AHA!" MOMENT: LISTENING TO THE DATA

THE JOURNEY THROUGH THE EXPERIMENTS

REALITY CHECK: HONESTY IN DATA SCIENCE

THREE LIFE-CHANGING LESSONS

CONCLUSION: THE FUTURE OF THE MUSEUM

Comments (0)

United States

Related News

UCP Variant Data: The #1 Reason Agent Checkouts Fail

Amazon Employees Are 'Tokenmaxxing' Due To Pressure To Use AI Tools

How Braze’s CTO is rethinking engineering for the agentic area

Décryptage technique : Comment builder un téléchargeur de vidéos Reddit performant (DASH, HLS & WebAssembly)

How AI Reduced Manual Driver Verification by 75% — Operations Case Study. Part 2