Data is the New Lego

When I was a child, I used to love playing with Lego; my brothers and I built spaceships and trucks and houses and animals. As time went on, our creations became more ambitious, functional, and lifelike.

We could each have insisted our Lego was our own, but by pooling resources, we collectively went further. Family and friends gave us Lego including unusual and hard to find bricks, which enabled us to make more accurate models. We were growing up too, and as our play became more sophisticated, we learned how to build better models.

I’m not young anymore, but I still remember playing with Lego as I go to work each morning and play with data to build models. Using data to solve real world problems, like style, fit, and size recommendations, is surprisingly like my childhood Lego memories.

To build something useful you need lots of data, data diversity, and the knowledge to build the right models in the right way.

Not enough data means giving bad advice

If you don’t have Lego bricks, the things you build aren’t realistic; the model is crude, the colours don’t match, and there are gaps. It’s the same with machine learning and computer models; if you don’t have enough data, your models are crude, and you have quantitative and qualitative errors. The history of computer modelling is rife with examples of people making bad decisions using models made with incomplete data.

In dealing with style, fit, and size recommendations, not enough data means giving bad advice because your models are too crude to accurately model people and garments. This is where pooling data wins; by pooling our Lego, my brothers and I could build what we wanted; in fashion, by pooling data from many retailers, you can build better models because you have a more complete picture of consumers’ behaviour and the unique style characteristics, size, and shape of garments.

Different markets need different data

To build a good quality Lego model you need a diversity of pieces – models built with just the standard 2x4 bricks are crude and inaccurate. This is where getting Lego from friends and family was so useful – we got more diverse bricks that let us build more accurate models. In fashion, you need a diversity of data on people and garments too. Simply extrapolating from the average size to plus sizes is like using 2x4 bricks for everything; one size does not fit all, and you end up with something that isn’t accurate for users who aren’t ‘average’.

Simply assuming UK consumers and apparel are the same as US or German consumers and apparel, is like using the same few Lego bricks for different models; different markets need different data. Simply believing a £20,000 dress fits the same as a £100 dress is like building Lego models when the special pieces you need are missing; it’s the kind of thing you do when you don’t have the data you need. In fact, having data on £100 and £20,000 dresses lets you build richer models that make better recommendations for all dresses. The key to good modelling is having data on a diverse set of consumers and garments.

Building something better

Young children make crude Lego models, the colours don’t match, and the shapes are wrong; older children build working models with careful colour schemes. A similar thing happens with data and algorithms. As you get to know and manipulate your data, your algorithms, and their interactions, you come to understand their limitations and you strive to build something better. As time goes by, increasing volumes of data point out the flaws in your work and you fix them– your models become better and better. In other words, the learning curve applies to building Lego and computer modelling.

It might be a brutal childhood truth, but the children with the most Lego, the best pieces, and the time to play produce the best models. The same truth applies for any AI based machine learning or computer modelling project. The projects with the biggest data volumes, the most diverse data, and the best teams to use that data will produce the most accurate models.

The benefit of this - helping people find clothes they’ll love, that suit their personal style preferences, and will fit and flatter them – Lego models only make a few people happy but style, fit, and size recommendations can make millions of people happier by helping them connect more easily with the clothes and shoes that better express who they are and how they feel.

Data and models and collaboration

Sometimes late at night, when it’s quiet and there’s no-one around to judge, I quietly put together Lego models. It’s a consoling and comforting reminder of my childhood, like eating ice cream, playing chase, and England losing in the World Cup. Lego has taught me a lot about data and models and collaboration. But there’s one big difference between building Lego models with my brothers and building computer models with my colleagues: I don’t fight with my colleagues quite so often.

By using a rich data collection from thousands of brands, it’s possible to not only improve the customer shopping experience but provide accurate size recommendations. A larger collection of Lego increases the size of scope of projects that can be built just as a vast data collection increases the scope of customers who are provided with accurate style, fit, and size recommendations.

About Mike Woodward

Replies

Please login or register to join the discussion.

There are currently no replies, be the first to post a reply.