Nelson's Weblog: tech / machine-learning

Machine Learning

Machine learning is becoming a mainstream technology any journeyman software engineer can apply. We expect engineers to know how to take an average and standard deviation of data. Perhaps it’s now reasonable to expect a non-expert to be able to train a learning model to predict data, or apply PCA or k-means clustering to better understand data.

The key change that’s enabling high end machine learning like Siri or self driving cars is the availability of very large computing clusters. Machine learning works better the more data you have, so being able to easily harness 10,000 CPUs to process a petabyte of data really makes a difference. For us civilians with fewer resources, libraries like scikit-learn and cloud services make it possible for us to, say, train up a neural network without knowing much about the details of backpropagation.

The danger of inexpert machine learning is misapplication. The algorithms are complex to tune and apply well. A particular worry is overfitting, where it looks like your system is predicting the data well but has really learned the training data too precisely and it won’t generalize well. Being able to measure and improve machine learning systems is an art that I suspect can only be learned with lots of practice.

I just finished an online machine learning course that was my first formal introduction. It was pretty good and worth my time, you can see my detailed blog posts if you want to know a lot more about the class. Now I’m working on applying what I’ve learned to real data, mostly using IPython and scikit-learn. It’s challenging to get good results, but it’s also fun and productive.

tech
2015-09-24 21:02 Z


Mastodon @nelson@tech.lgbt Linkblog Thu 2026-05-14 Reddit Russian propaganda Tue 2026-05-12 NVME erasing Sun 2026-05-10 2fa 1337 USB Cheat Sheet Fri 2026-05-08 xdg-ninja Kyle Kingsbury Podcast Podcast Mon 2026-05-04 Is GitHub Cooked? Containers vs VMs India health survey (PDF) Sun 2026-05-03 Medicat USB Fri 2026-05-01 RNGdle Grass Valley welcome arch Thu 2026-04-30 Oil Refineries AI goblins Wed 2026-04-29 Sniffies $100M 1Password + Flatpak browser rpm-ostree Tue 2026-04-28 GitHub update NSF board fired Mon 2026-04-27 Linux VRAM management Search Archives 2024 12 11 10 09 08 07 06 05 04 03 02 01 2023 12 11 10 09 08 07 06 05 04 03 02 01 2022 12 11 10 09 08 07 06 05 04 03 02 01 2021 12 11 10 09 08 07 06 05 04 03 02 01 2020 12 11 10 09 08 07 06 05 04 03 02 01 2019 12 11 10 09 08 07 06 05 04 03 02 01 2018 12 11 10 09 08 07 06 05 04 03 02 01 2017 12 11 10 09 08 07 06 05 04 03 02 01 2016 12 11 10 09 08 07 06 05 04 03 02 01 2015 12 11 10 09 08 07 06 05 04 03 02 01 2014 12 11 10 09 08 07 06 05 04 03 02 01 2013 12 11 10 09 08 07 06 05 04 03 02 01 2012 12 11 10 09 08 07 06 05 04 03 02 01 2011 12 11 10 09 08 07 06 05 04 03 02 01 2010 12 11 10 09 08 07 06 05 04 03 02 01 2009 12 11 10 09 08 07 06 05 04 03 02 01 2008 12 11 10 09 08 07 06 05 04 03 02 01 2007 12 11 10 09 08 07 06 05 04 03 02 01 2006 12 11 10 09 08 07 06 05 04 03 02 01 2005 12 11 10 09 08 07 06 05 04 03 02 01 2004 12 11 10 09 08 07 06 05 04 03 02 01 2003 12 11 10 09 08 07 06 05 04 03 02 01 2002 12 11 10 09 08 07 06 05 04 03 02 01 2001 12 11 10 09 08 07 One good site MDN Nelson Minar nelson@monkey.org Blog licensed under a Creative Commons License		Machine Learning Machine learning is becoming a mainstream technology any journeyman software engineer can apply. We expect engineers to know how to take an average and standard deviation of data. Perhaps it’s now reasonable to expect a non-expert to be able to train a learning model to predict data, or apply PCA or k-means clustering to better understand data. The key change that’s enabling high end machine learning like Siri or self driving cars is the availability of very large computing clusters. Machine learning works better the more data you have, so being able to easily harness 10,000 CPUs to process a petabyte of data really makes a difference. For us civilians with fewer resources, libraries like scikit-learn and cloud services make it possible for us to, say, train up a neural network without knowing much about the details of backpropagation. The danger of inexpert machine learning is misapplication. The algorithms are complex to tune and apply well. A particular worry is overfitting, where it looks like your system is predicting the data well but has really learned the training data too precisely and it won’t generalize well. Being able to measure and improve machine learning systems is an art that I suspect can only be learned with lots of practice. I just finished an online machine learning course that was my first formal introduction. It was pretty good and worth my time, you can see my detailed blog posts if you want to know a lot more about the class. Now I’m working on applying what I’ve learned to real data, mostly using IPython and scikit-learn. It’s challenging to get good results, but it’s also fun and productive. tech 2015-09-24 21:02 Z Nelson's Weblog • tech → ago, bad, bittorrent, blosxom, dotnet, good, hqnx, iphone, mac, phone, photo, python, webservices