November 23, 2020

Learning ML: an ecosystem

by Sam Sandqvist

Just starting with machine learning? You’ll need to think about the ecosystem in which both to learn and practise what you learned.

Blog 18-1

The most common implementation language for learning and applying artificial intelligence and machine learning in particular is Python. Although other implementation languages do provide robust environments for ML, it is clear the Python is the favourite of the moment, so we will assume that we’ll use it.

Some of the more important components are the following

Python 3

Of course, we have to have Python. The latest version is 3.8.5 available at python.org. Note that Microsoft Windows does not include Python, so you have to download and install it. Apple macOS does include it, but the version is not current; please install to the latest for this environment (both can -and should- coexist on the Mac).

Jupyter Notebook

Jupyter notebooks basically provides an interactive computational environment for developing Python-based data science applications

Extensive set of packages

Python has an extensive and powerful set of packages which are ready to be used in various domains. It also has packages like numpy, scipy, pandas, scikit-learn etc. which are required for machine learning and data science.

These packages are used in the ML implementations cited in the previous article.

One of the interesting aspects of ML and AI is that you should think carefully where (and how) to use it. The following diagram shows some of the thinking I use for this aspect.

Blog 18-2

Finally, how should we go about learning to code AI and ML?

Practical steps in building an ML application

1. Use needs to define metric-based goals

high or low accuracy?

2. Build an end-to-end system

  • get up and running as soon as possible
  • build simplest viable system first
  • baseline: copy from industry best-of-breed, or similar (check github!

3. Data-driven refinement

Metrics are important. You cannot manage what you don’t measure.

  • Accuracy? (% of examples correct)
  • Coverage? (% of examples processed)
  • Precision? (% of detections right)
  • Recall? (% of objects detected)
  • Amount of error? (for regression-like problems)

Architecture of the neural network is very important.

1. Deep or not?

  • Lots of noise, little structure -> not deep
  • Little noise, complex structure -> deep

2. Architecture family:

  • no structure -> fully connected
  • spatial structure -> convolutional
  • sequential structure -> recurrent

In general, you should use a data-driven adaptation.

  • Choose what to do based on data
  • Don’t believe hype
  • Measure training and testing error: over- vs. underfitting
    • inspect data for defects
    • tune learning rate and other optimisation setting
    • make model bigger
    • there could be software bugs (but don’t believe it)

Your best bet? Read tutorials, implement simple applications. And read code (github.io is your friend!). Any questions? Ask and browse stackoverflow.com

Sam Sandqvist
AUTHOR

Sam Sandqvist

Dr Sam Sandqvist is our in-house Artificial Intelligence Guru. He holds a Dr. Sc. in Artificial Intelligence and is a published author. He is specialized in AI Theory, AI Models and Simulations. He also has industry experience in FinServ, Sales and Marketing.