A week after the event, finally writing down some observations on my experience at the RapidMiner Wisdom 2016 conference in NYC. An incredible team of data science champions came together for two days to share notes, learn about platform updates, and hear from industry luminaries on the state of Big Data and Predictive Analytics.
I sponged it all up and gained some valuable insight into the real future of this platform, and the market.
First, some background…
The Citizen Data Scientist
RapidMiner is really targeting the “Citizen Data Scientist” with their platform. The theory is powerful: easy-to-use tools, with a full-featured backplane for the pro analyst who wants to dig deep, will broaden the scope of what we can all do with Data Science. Analytics is such a big part of the Information Revolution, and the whole premise of Big Data is not just about large information sets – it’s also about making that information and powerful tools available to the front-office.
Now, I don’t think this means that “everyone’s a data scientist” (anyone remember a similar PM-focused campaign for Microsoft Project?) but it does mean that data-minded individuals are the vanguard of the Data Economy, regardless of their degree or what department they work for.
It’s a natural evolution of information science to take it from the mainframe to the warehouse to the shore of the Data Lake, and then invite everyone down to a beach party. RapidMiner’s play in this market is to make real, actionable insights available to the broadest audience possible. But you’ll need at least some light education in statistics and quant operations to make the most of this platform.
Fortunately, that’s where the RapidMiner Community comes in. Like many explorers, I first encountered RapidMiner through the Community Edition download on their website. With a quick install, slick interface, and easy tutorials, I was able to get up and running with Decision Trees for feature selection in about 15 minutes. Simply amazing to me, a shell-and-script guy!
Predictive Analytics in a drag-and-drop format beats my old coder approach, hands-down. The fact that the RapidMiner Community is populated with truly helpful analysts (about 250,000 users) who want to give and learn means that I can get my questions answered quickly.
An interesting feature within RM is “The Wisdom of Crowds” which allows anonymised data about your models to be shared in aggregate with the RapidMiner cloud software. The engine analyses your flow, and responds back in realtime with suggestions about which operators and models to try out based on what others are using to solve similar problems.
This Analytics Recommendation engine is like the iTunes Genius with concrete benefit to users who need to crank out a model, fast. Talk about accelerating your work with the help of others. Tobias Malbrecht does a nice writeup of this here.
Steve Farr, Global Community Director at RM, detailed his strategy to increase participation a full order of magnitude. That’s right – over the next two years his goal is to get to over 3 MILLION users of the software, all sharing information, all improving the platform’s capabilities and learning features while they generate predictions at high velocity.
Good thing, too…one of the few dings Gartner gave the platform last year was concerns around training and support. Steve’s bold gambit should pay off nicely for the Crowd, and every user of the platform will benefit — we’ll find out how Gartner views this play later this quarter when their 2016 quadrant comes out. I’ll bet we see continued closure with the lead dogs in the industry. This platform’s rising in the charts with a bullet….
Very Compelling Sessions
The highlight of this conference was presentations by a broad sub-sector of the Analytics space. Keynote speaker David Weisman from UMass Boston outlined in a very clever presentation the inherent dangers of correlative analysis – that “Nobel Prize Winners Eat Chocolate” graphic.
This is up on my office wall as a constant reminder to be diligent in assessment. Also keynoting was Vijay Kotu from Yahoo, who walked us through the danger of cognitive biases and gave an insider’s view into how Yahoo uses analytics to derive insights into user search behaviour.
(An Open Data bonus: Yahoo recently released 20 million anonymised user search records for analysis. Check it out here.)
We had a real treat in listening to Lauren Ellenburg from the Southwest Kidney Institute talk about their speedy use of the RM platform to devise custom dashboards for doctors. Very compelling integration between RapidMiner and Tableau to make the predictions and mining both accurate and beautiful. RM also announced a tight integration module between these two industry-leading solutions in a product overview led by Chief Product Officer Lars Bauerle .
Jeff Hunter from PWC walked us through their transaction monitoring practise for anti-money laundering (AML). These folks probably have the most buttoned-down approach in the industry. And, Boris Scharinger did a deep-dive on how Siemens’ Audit department is using analytics as a consultative service to help departments implement governance.
Did I say “consultative audit?” Yes, it’s really true, and their visual, realtime process modeling approach was really something to behold.
Vamsi Chemitiganti from HortonWorks walked us through the power of the only true Open-Source distro of Hadoop, linked up with RapidMiner. I was able to spend some time with the RapidMiner’s Big Data architect Zoltán Prekopcsák, who developed the Radoop connection scheme (RapidMiner and Hadoop).
I came away eager to fill my Big Data Lake with a pile of unstructured data and point RapidMiner 7 against it for analysis. The power of this connector cannot be understated: you drag-drop an operator in the visual workspace, and the system will generate EVERY STATEMENT required to interface with your Hadoop cluster.
If you have experience with “rah rah Hadoop” followed by mind-numbing integration difficulty with current distros (as I do), you’ll know exactly what this means for the future of Big Data Analytics.
Alternative Data Analytics was at the forefront with discussions from Elian Carsenat of NamSor, who showed us how you can take simple name data and derive gender, ethnicity, and locale with amazing accuracy. This capability is embedded as an add-on operator within RM, using his firm’s service. On the Social side, Mike Waldron showed off the connector that pulls Aylien’s feed and post data from the likes of Facebook and Twitter. This data is the future of consumer predictive behavioural analysis, and it’s a real treat to have these capabilities embedded right within the RM platform for us.
Now to the stars of the show…and likely the youngest in attendance. Dr. Anasse Bari of NYU brought along his students to present their groundbreaking work in Open Data Analytics. Their first session demonstrated a complete map of the NYC bike route system, using predictive analytics to forecast the hot spots at given times of the day.
Their application of this research will potentially help ease congestion in bike lanes, and deliver additional city services where needed. The afternoon session featured a complete proposal for a natural language hyper-processor, using predictive analytics in realtime to tune search capabilities. Watching these students present their ideas, I was reminded of the initial research paper crafted by Larry Page and Sergey Brin that outlined a little idea called Google. Quite impressive.
Yes, I spoke too…
I was truly honoured when the RapidMiner team asked me to share the Big+Open Data Analytics strategy for my new venture, rel8ed.to. Our Canadian business (launching next month) is focused on the use of public information as an alternative data source for predictive modeling in the Financial Services and Insurance sectors.
What we know is that machine learning strategies require a LOT of clean data to pick the right model features. Our discussion centred on the research, data cleansing, and feature selection approach we use to create market predictions for Small and Medium Business. Good (and thankfully time-limited!) open discussion, a nice way to end the day of presentations.
A few laughs from the crowd as I tried in vain to get my “Hackintosh” (Ubuntu Linux on a MacBook Pro) to interface with the A/V System to conduct the demo.
Finally connected, we finished up the session with a briefing on the methods we are using in our emerging Business Accelerator Model research. Our Open Data-driven predictions are finding the bottom 10% of businesses that will likely fail in 2017, and the top 20% that are almost sure to grow and extend. We want more, though — who wouldn’t? More modeling to do….
A shout-out to Jeff Frick and David Vellante from siliconAngle, who filmed much of the proceedings, including an interview in which I outline the rel8ed.to value proposition for Clean Open Data as part of your analytics strategy.
Of course, you cannot have any discussion about RapidMiner without mentioning our dear friend Ingo Mierswa founder of the company. This guy is brilliant yet approachable, and a real card in front of a crowd. If you haven’t checked out his discussion on predicting Alien sightings in America (as well as a more-serious chat on the analytics market), here you go!
Also make a point to view the very-cheeky “Five Minutes with Ingo” series on YouTube – you’ll get a great idea of how fun this conference was.
Partnering For Success
One of the great aspects of engaging with RapidMiner is their extensive partner network. As expected for a company that started in Germany, there were plenty of firms from Europe at the conference, each with some very unique takes on how to accelerate the Predictive Analytics space. These folks are tops at their game, and at least a few of them are led by RM alumni who have now set off on their own to build customisations of the platform and provide remote consulting help.
Check out Sebastian Land’s Old World Computing firm, he was an early developer on the platform who now mixes custom operator development with a solid remote consulting practise. I was also quite impressed with the North American firms who have chosen to make the platform part of their services strategy.
In particular, Slalom Consulting’s Abe Winograd was stellar in discussion – this shop has a strong RapidMiner practise across North America. Also of note is Rishi Bhatnagar’s Syntelli Solutions, with a growing army of certified analysts on the eastern seaboard.
Look these firms up if you’re seeking consulting help with advanced analytics, I’ll likely be using them to jump-start our own services up here in Canada.
To sum up…
Great times at this year’s conference, backing up a very serious message: this platform is on the rise, and bears a close look from anyone in the data science business. You should give it a shot with a simple Community download . I’ll bet you find what I did – solid capability for agile analytics, a great user base, and a lot of fun to work with. Great for that Citizen Data Scientist in all of us.
Footnote: On the way back to Canada from NYC after the conference, I was seated next to an up-and-coming quant analyst from a global bank. We got to talking, I tell him about the conference, and before you know it out comes my laptop for a demo. He grabs the machine, takes the tutorial, and by the end of the hour-long flight he’s teaching me about XB-Tree theory using my own version of RapidMiner. Always something new to learn, that kid’s one to watch. Here’s hoping the Buffalo market can keep him occupied and engaged, or we’ll be pulling him up Toronto-way.
Big+Open Data, Data Scientist, Predictive Analytics
This post was written by Bob Lytle