If you have been using Office 365 or SharePoint as a data management tool or business application platform, you most likely have come across checkbox columns (aka multi-choice columns). Creating a list with multi-choice columns is effortless. As is the data entry.
For instance, you have a custom list to keep track of office locations in various companies. In order to expedite data entry, you build a Choice column and select Checkboxes to accept multiple selections. There you have a multi-choice column!
The fun really begins when you start analyzing the data and generating visualizations from the collected data. The challenges lie within how the mulit-choice data is stored. Similar to HTML, all selected choices are stored as a single field with delimiters. In this case, SharePoint uses “;#” delimiters. This requires parsing and data massage.
In this example, you have a SharePoint list called Company Offices. Each company may have 1 or more global offices. You have been asked to crank out a report on the number of offices in each country. The underlying data in SharePoint looks like this:
There is not an easy mean to perform aggregation within SharePoint. Solutions like SQL Server Reporting Services (SSRS) and SQL Server Integration Services (SSIS) are potential candidates and require a lengthy setup. What if the token requirement is “We needed this yesterday for an executive report”? It will take a little longer to spin up the above infrastructure!
If you excel in Excel, you can pull in the list data and combine formulas and pivot tables to achieve the above objectives. Such solution, however, is not particularly reusable.
Python comes into the solution beautifully. Python is a general purpose scripting language and can be set up on virtually all operating systems within minutes. (Basic Python is beyond the scope of this post. Google many of the excellent primer on the interweb). In addition to core Python libraries, we are importing two libraries for to extend functionalities. Pandas is a Python library providing high-performance, easy-to-use data structures and data analysis tools. Seaborn is a Python visualization library providing a high-level interface for drawing attractive statistical graphics.
The source code is published on Github (https://github.com/klopmp/sharepoint-data-analysis). Let’s give this a quick run-through.
Also published on Medium.