Full description not available
J**W
Excellent Book, For What It Is
I'm a Python software developer with an interest in applied statistics. This is an excellent book on data analysis, but for review purposes, it's worth initially pointing out what this book is not.It is not a comprehensive survey of open source tools that are available, and it does not contain many examples of working code to implement the techniques he talks about, though there are some. For this reason, I'd strike the "with Open Source Tools" from the title in evaluating whether you want to purchase the book.The author greatly favors mathematical notation over code examples in describing the data analysis techniques he presents. While this is not a bad thing per se, you'll have to struggle to comprehend the content if you're a programmer without an academic familiarity with math, or if you've been away from mathematics for a long time.As other reviewers have pointed out, the organization of the content is somewhat disjointed. Going from chapter to chapter, there is little in the way of causality, and the early chapters are pretty math-heavy. The reader is advised to consult appendices at the back of the book to refresh themselves on the basics, if required.Wait! I didn't say you shouldn't buy it.Despite a few shortcomings, this book does offer a good introduction and overview of several basic techniques. It's an excellent survey of the current data analysis landscape for anyone who's not familiar with it. If a topic seems irrelevant to you, it's pretty easy to skip that chapter and move forward.On top of that, the author's writing style and ways of explaining relatively esoteric concepts is generally very good. As with many good books, you get the sense the author is a co-worker, trying to explain something to you in terms you can understand. It's very example-based, even if those examples don't always involve code.All in all, to get the most out of this book, the best approach is careful and methodical study. The author covers many topics quickly, and not any one in depth, so if one chapter interests you, I'd plan on consulting other resources on particular topics. Luckily, the author does offer several "Further Reading" recommendations for each topic.Most books containing information on these techniques are far harder to read, and they generally cost at least twice as much. Highly recommended. Thanks for this one, Philipp.
H**H
a nice balance between theory and practice
Data Analysis with Open Source Tools does a great job covering a lot of topics in way that balances theoretical explanations and practical demonstration. In keeping true to its title, a wealth of tools (and data sources) are identified and explored.Because the book offers a balance between explanation and demonstration it can be read in two different ways. First, you can read the chapters without getting involved with the code to get a better understanding of the whys and hows of the different analysis techniques. On the other hand, if you are more of a brass tacks person, you can focus on the code, run the examples, and just skim the explanations.For those that are exploring the world of data analysis, this book is a great compliment to Segaran's Programming Collective Intelligence: Building Smart Web 2.0 Applications and Russell's Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites. Where the books overlap the explanations and examples differ which helps enormously when trying to master the concepts and techniques. However, each book contains topics not in the others. Collectively they offer a rather powerful set of tools.Having read the other books prior to this one, I really appreciated the time spent on the mathematics behind each technique. The others get your hands dirty very quickly - and I appreciated that greatly when first exploring data mining - but I found myself wanting to have a deeper understanding which this book so nicely provides. As Janert mentions in the first chapter, the succinct notation of mathematics is much clearer than having to try to extract the essence of twenty lines of source code. Without a doubt, though, Data Analysis is dense which and that might turn a few people off.All said and done, I'm glad I took the time to read the book and will definitely keep it nearby.
A**.
It has its flaws, but on general a great overview
I've read some of the other reviews, and I do agree with most of the criticisms. There are quite a few errors in formulas and in the text, and it would've been really nice if the source codes and data files were given in a CD or were available on a website.That being said, the book addresses a lot of different topics - ranging from the introductory, freshman-level statistics to more advanced data mining and machine learning techniques, and passing through notions of design. It doesn't go in depth into each of them, but offers a fairly good overview, and references in case you're interested. Furthermore, the author gives some useful hints on how to do outside-the-box thinking and how to apply these techniques into business.Being a physics grad student, I've found many of the topics pretty much basic, but even so, I've learned a lot. Overall, a great introduction; I really hope the flaws are corrected on a future 2nd edition.
Trustpilot
2 months ago
3 weeks ago