Perspectives on Data Science for Software Engineering is a collection of short research papers on using the tools provided by data science to do research into software engineering. It isn’t about the concepts of data science for software engineers as I thought it would be when I initially picked it up. This difference had me put it down the first time I picked it up to read it, but when I came back around to it I found myself interested not in the data science aspect of it, but the software engineering research aspect.
While none of the individual papers was something I read and immediately knew how I could apply in my own practice, the overall package helped me feel positive for progress in software engineering. Outside of language design, it sometimes feels like most of the software engineering learning we’ve done going as far back as the 70’s and 80’s hasn’t been applied in practice. I think part of the difference is because the research is disconnected from the way software is built in the wild. The research is hyper-specific, (e.g., focusing on a particular kind of software in a single language) or defines problems but not solutions (e.g., the work on code quality metrics). The research isn’t wrong, but it’s missing a step about how to apply the work to what you’re doing.
The only piece in here that I saw and felt had an immediate connection to what I was doing was the piece on bug clustering. That showed that the more bugs a file had the more likely it was to have more bugs in future iterations. This seems like it may lend some credence to the idea of rewriting a piece of code that has quality problems to effectively blank the slate and start over again.
Overall the book was intellectually stimulating but has no real practical usage for what I do or what I feel would be the average software developer. If your role straddles the practical and academic worlds then this may have more value to you.