Faster Python Libraries for ML

Faster Python Libraries for ML
Faster Python Libraries for ML

There are several options for speeding up your machine learning prototyping. The most obvious one is to use GPGPUs. But of course, there’s the issue of cost, since an appropriate Nvidia card will set you back between $1K and $2K. And don't forget, you’ll probably need to upgrade your power supply and cooling fan. But if your department (like most) is under budget constraints (though maybe you’re just using this as a learning experience or doing it just for the fun of it), there might be a middle ground for speeding up the processing and saving a lot of money at the same time.

Given a few basic assumptions/prerequisites about your development platform:

Your computer is Intel-based (preferably with multiple cores).

Your ML code is in Python and relies heavily on the Python math libraries.

You're willing to substitute the standard Python libraries with equivalent versions targeted to a proprietary CPU.

Okay, I admit this won't work for everybody, but I'll bet it's a fit for a very large percentage of you out there working with ML. And, of course, the speedup will not be anything like adding 1,000 GPGPU cores, but it's likely to speed things up by a factor of three — and in situations where your matrices fit the sweet spot for your CPU (L1/L2 cash, number of cores, etc.), the speedup factor could even approach double digits.

So here's the deal.

First of all, it only works for Intel because Intel wrote the libraries with the intended purpose of showing off some of the cool little tricks and shortcuts that can be used at the hardware level. Generally, open-source software does not take advantage of proprietary, nonstandard instructions and manipulations. Since their engineers wrote the code for these libraries, we can expect that they have squeezed all of the performance juices that they can in their implementation. And price is not an issue since it is free, although you will have to register. You can learn more about it and download it from here. There are distributions for Apple, Linux, and Windows. The library suites come in two flavors: Python 2.7 and Python 3.6. The downloads are a little over 300 MB and install quite easily from the console with the provided instructions. That's it! 

If you'd rather have someone explain in detail what these libraries are and how they work, Intel has made an informative 30-minute video. It is clear and informative (not a sales pitch).

Also (and this is explained in the video), your performance increases can follow you if you're using Anaconda in the cloud. Since it is likely your application running in the cloud, it will be on servers with many more cores and RAM than you have at your development station you will likely gain even more speed. You should be able to move your code effortlessly to the cloud and potentially increase speed by 200 times because the libraries automatically deal with the multi-core, many-core, cluster contexts.

A further note on RAM: Since the improvements in computation are centered around the CPU, you will only see these improvements if your data will fit in RAM. If you can toss another 16, 32, or 64 GB of RAM into your box, you will be able to tackle some moderately (by today's standards) large training spaces. 

Today, most machine learning work is done in a tight prototype/test loop. Data structures are experimented with using small subsets of the ultimate data. Later, the dataset sizes increase as improvements in the prototype are tested. A small but significant computational speedup like this will allow you to progress farther along the development spiral before having to go to the cloud.

Who knows? Maybe this small but significant speedup will help you make the case to your boss (or your spouse?) that you absolutely need a couple of Nvidia Quadro GP 16 GB cards. Could happen.