![]() ![]() Given that, it’s not surprising that there’s no mention of information geometry, which is apparently a bleeding edge topic in data mining and knowledge discovery. Nor is there any mention of information, at least as the term was used in the professional sense of information theory, or of the many measures associated with such famous names in the field as Claude Shannon or Andrey Kolmogorov. ![]() I couldn’t find any reference in Minitab’s extensive Help files to topics that are often associated with pure data mining, like neural nets, fuzzy sets, entropy, decision trees, pattern recognition or the Küllback-Leibler Divergence. The further we get from ordinary statistical tasks like hypothesis testing and analysis of variance (ANOVA) towards machine learning and other examples of “soft computing,” the more the balance shifts back to SSDM. …………Minitab’s data mining capabilities differ from SQL Server’s mainly by the fact that it implements algorithms of lower sophistication, but with a wider array of really useful variations and enhancements. As we shall see, the same is true in comparisons of Minitab’s visualizations to SQL Server Reporting Services (SSRS), where the main dividing line is between out-of-the-box functionality vs. In cases when SQL Server and Minitab compete head-to-head, SSDM wins hands down in both performance and usability. Minitab offers a wider and more useful selection of these alternative algorithms than WEKA, an open source tool profiled in the first couple of articles. That happens quite often, given that there are literally so many thousands of algorithms that no single company can ever implement them all. So far in this haphazard examination of Microsoft’s competitors, the general rule of thumb seems to be that SSDM is to be preferred, particularly on large datasets, except when it doesn’t offer a particular algorithm out-of-the-box. ![]() In this installment, I’ll cover some Minitab’s implementations of more advanced algorithms that we’d normally use SSDM for, but which are sometimes simple enough to still be implemented in T-SQL. For example, as I mentioned last time around, Minitab implements many statistical functions, tests and workflows that are not available in SSAS or SSDM, but which can be coded in T-SQL whether or not it is profitable to do so varies by the simplicity of each particular stat and the skill level of the coder in translating the math formulas into T-SQL (something I’m hell-bent on acquiring). Most of the use cases for Minitab (and possibly its competitors, most which I have yet to try) come under rubric of statistics, which falls in the cracks between T-SQL aggregates and the “Big Data”-sized number-crunching power of SQL Server Analysis Services (SSAS) and SQL Server Data Mining (SSDM). I’m only concerned here with assessing how well a particular data mining tool might fit into a SQL Server user’s toolbox, not with their usefulness in other scenarios that is why I made comparisons solely on the ability of various SQL Server components to compete with Minitab’s functionality, whenever the two overlapped. ![]() …………Professional statistical software like Minitab can fill some important gaps in SQL Server’s functionality, as I addressed in the last post of this occasional series of pseudo-reviews. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |