Machine learning is becoming a mainstream technology any journeyman software engineer can apply. We expect engineers to know how to take an average and standard deviation of data. Perhaps it’s now reasonable to expect a non-expert to be able to train a learning model to predict data, or apply PCA or k-means clustering to better understand data.

The key change that’s enabling high end machine learning like Siri or self driving cars is the availability of very large computing clusters. Machine learning works better the more data you have, so being able to easily harness 10,000 CPUs to process a petabyte of data really makes a difference. For us civilians with fewer resources, libraries like scikit-learn and cloud services make it possible for us to, say, train up a neural network without knowing much about the details of backpropagation.

The danger of inexpert machine learning is misapplication. The algorithms are complex to tune and apply well. A particular worry is overfitting, where it looks like your system is predicting the data well but has really learned the training data too precisely and it won’t generalize well. Being able to measure and improve machine learning systems is an art that I suspect can only be learned with lots of practice.

I just finished an online machine learning course that was my first formal introduction. It was pretty good and worth my time, you can see my detailed blog posts if you want to know a lot more about the class. Now I’m working on applying what I’ve learned to real data, mostly using IPython and scikit-learn. It’s challenging to get good results, but it’s also fun and productive.

  2015-09-24 21:02 Z

The addition of ad blocking capability to iOS has brought on a lot of hand-wringing about whether it’s ethical to block ads in your web browser. Of course it is! Blocking ads is self preservation.

Ad networks act unethically. They inject huge amounts of garbage making pages load slowly and computers run poorly. They use aggressive display tricks to get between you and the content. Sometimes negligent ad networks serve outright malware. They violate your privacy without informed consent and have rejected a modest opt-out technology. Ad systems are so byzantine that content providers pay third parties to tell them what crap they’re embedding in their own websites.

Advertising itself can be unethical. Ads are mind viruses, tricking your brain into wanting a product or service you would not otherwise desire. Ads are often designed to work subconsciously, sometimes subliminally. Filtering ads out is one way to preserve clarity of thought.

I feel bad for publishers whose only revenue is ads. But they and the ad networks brought it on themselves by escalating ad serving with no thought for consumers. The solution is for the ad industry to rein itself way in, to set some industry standards limiting technologies and display techniques. Perhaps blockers should permit ethical ads, although that leads to conflicts of interest. Right now Internet advertisers are predators and we are the prey. We must do whatever we can to defend ourselves.

  2015-09-21 02:24 Z

The Ubiquiti NanoStation loco 5M is good hardware. It’s speciality gear for setting up long distance wireless network links. All of Ubiquiti’s networking gear is worth knowing about if you’re a prosumer-type networking person. I will probably buy their wifi access points next time I need one.

I’m using two NanoStations as a wireless ethernet bridge. My Internet up in Grass Valley terminates 200’ from my house. I couldn’t run a cable but a hacky wireless thing I set up was sort of working. So I asked on Metafilter on how to do a wireless solution right and got a clear consensus on using Ubiquiti equipment. $150 later and it works great! Kind of overkill; the firmware can do a lot more than just bridging and the radios are good for 5+ miles. But it’s reliable and good.

The key thing about Ubiquiti gear is the high quality radios and antennas. It just seems much more reliable than most consumer WiFi gear. Their airOS firmware is good too, it’s a bit complicated to set up but very capable and flexible. And in addition to normal 802.11n or 802.11ac they also have an optional proprietary TDMA protocol called airMax that’s designed for serving several long haul links from a single basestation. They’re mostly marketing to business customers but the equipment is sold retail and well documented for ordinary nerds to figure out.

I still wish I just had a simple wire but I’ve now made my peace with wireless networking. It works well with good gear in a noncongested environment. I wrote up some technical notes on modern wifi so I understood the details better. Starting with 802.11n and MIMO there was a significant improvement in wireless networking protocols, it’s really pretty amazing technology.

  2015-09-19 16:54 Z

The CyberPower CP350SLG is a good small uninterruptible power supply. Its only rated for 250W and it only has a few minutes of battery life. Not suitable for a big computer. But it’s perfect for backup power for network gear, like a router or a modem or the like. And it’s pretty small, just 7x4x3 inches. I made a mistake and bought APC’s small UPS first and the damn thing is ungrounded, which is ridiculous and dumb. I’ve had better luck with CyberPower UPSes anyway and this small one is exactly what I needed.

I’m a big fan of small UPSes. I don’t need something to carry me through a 30 minute power outage, I just want some backup that will keep my equipment running if the power drops for a couple of seconds. Because PG&E, you know? It’s a shame there’s no DC power standard, I bet you could make a DC-only UPS 1/4th the size with a lithium battery. But instead it’s all lead-acid batteries and producing 110V AC just to be transformed back to DC by all the equipment its powering. (That APC UPS does have powered USB ports, a small step towards DC UPS.)

Some day I should look into whole-house UPS units. A quick look suggests it’s about $2500 for 2.7kW, plus installation. This discussion suggests $10k is more realistic if you really mean a whole house.

  2015-09-19 15:55 Z

Jupyter Notebooks (née IPython Notebooks) feel like an important technology to me. It’s a way to interactively build up a computer program, then save the output and share it with other people. You can see a sample notebook I made here or check out this gallery of fancy notebooks. It’s particularly popular with data scientists. If you’re an old Python fogey like me it’s kind of a new thing and it’s exciting and worth learning about. I’m focussing on Python here, but Jupyter is now language-agnostic and supports lots of languages like R, Go, Java, even C++.

The notebook is basically a REPL hosted in a web browser. You type a snippet of code into the web page, run it, and it shows you the output and saves it to the notebook. Because the output is in the browser it can display HTML and images; see cells 6 and 7 in my sample notebook. There’s excellent matplotlib support for quick data visualization and lots of fancier plugins for D3, Leaflet, etc if you need.

Notebooks are made to be shareable. My sample is basically a static snapshot of a program’s output, there’s no Python running when you view it on GitHub. But you can also download that file, run the Python code on your own computer, and modify it however you want. That makes it an incredibly useful pedagogical tool. There are complex notebooks you can download that are effectively whole college courses in computing topics. And you can keep your own notebooks around as documentation of the work you’ve done. It’s a very powerful tool.

Behind the scenes what’s going on is the browser window acts like an attachable debugger. There’s a headless IPython server process running somewhere and the browser connects to it. It’s easy to run IPython yourself on your own machine or there are other options including cloud hosted live notebooks. Most of the display magic works by having objects define their own _repr_html_ methods, that’s what lets Pandas show that nice HTML table in cell 6.

Installing and starting IPython is pretty simple, just install it with pip and run ipython notebook. You’ll also want %matplotlib inline for inline plots. Notebooks seem particularly popular with data scientists; if you want a full Python data environment with Pandas, scikit-learn, etc then the Anaconda distribution is an easy way to get going.

  2015-09-18 14:45 Z