A weekend with Google AutoML Beta
Some time ago I built a site called slackernews.io to take the top-rated submissions from HackerNews and categorize them into newspaper sections. In this way I could focus on the ones I care to read while blithely ignoring the sections I don’t (cough election news cough).
This categorization was initially built using a machine learning classification model on IBM Watson. Unfortunately, IBM decided to make a breaking change to their API that would require me to rebuild the model and call it with brand-new endpoints (I thought the lesson from the success of Microsoft was to maintain backwards compatibility come hell or high water?).
Instead, I decided to spend the weekend hacking away on the newly announced (Jul 14, 2018) Google AutoML Natural Language Platform.
Key Learnings #
- The price is right: FREE!
- AutoML has the feature-set I need.
- For my use-case AutoML didn’t come close to IBM Watson on accuracy, despite using 15x the number of training examples. Frankly I found this deeply disappointing.
- AutoML is still in Beta, so it lacks a client library in C#.
- To confuse matters, there is a client library in C# for a similar platform called Google Cloud Natural Language API, which only categorizes into pre-defined categories rather than custom categories.
- Google Cloud does not make it easy to authenticate without using a client library (in fact, I wasn’t successful in obtaining an oauth token for use with the REST API).
- Remote desktop from Mac to Windows Server running C# to invoke IronPython is a world of pain. Don’t do it :P
What’s next? #
- AutoML integration on-hold until they release a client library in C#
- Time to investigate AWS SageMaker!