Agencies in the U.S. federal government publish an average of 3,700 proposed rules yearly, according to the U.S. Government Accountability Office. With each proposed rule, agencies generally provide an opportunity for stakeholders and members of the public to submit comments before the rules are finalized. In some instances, thousands of comments are submitted, with no consistent government-wide process for intaking, analyzing and reporting the findings. A similar dynamic exists in the private sector, where organizations regularly solicit and analyze feedback and comments from customers to improve products and services.
To harness the power of feedback effectively, Microsoft Federal’s customer success team developed a robust solution for Comment Analytics. The solution identifies, extracts, and analyzes aggregated comments to identify and report key insights. The following blog details the approach the team followed, along with a link to the code, shared in our repository.
Discover themes, sentiment and suggestions
The solution we’ve developed provides users with insights by understanding stakeholder perspectives across a series of elements. It can be leveraged for any scenario where you need to extract insights from multiple documents related to a particular topic (e.g. Loan documentation, contract documents, project proposals, and public rulemaking) and do further analytics. Key areas of insight that can be analyzed include:
- Common themes: Identifying recurring issues that stakeholders frequently mention. Highlighting the most frequently discussed areas across all analyzed documents and comments.
- Overall sentiment: Gauging the overall tone—positive, neutral, or negative—to assess stakeholder satisfaction.
- Specific likes and dislikes: Understanding what stakeholders appreciate and what they find frustrating. Pinpointing specific pain points that need to be addressed promptly.
- Stakeholder Suggestions: Collecting actionable ideas from stakeholders for potential improvements.
Our solution supports various file types to ensure broad applicability:
- Text
- CSV: Each line is treated as a separate comment
The solution can also be easily extended to support other file types like Word, PowerPoint, and more.
How it works
1: Extract Insights from Individual Comments
We start by extracting insights from each individual comment. For CSV files, each line is treated as a separate comment. If a comment is larger than a specified size, we chunk it to manage the data efficiently. This step generates a JSON file with all the extracted insights, including:
- Summary: A summary of the overall comment.
- Main Themes: Identification of themes with brief summaries for each. Predefined theme categories can be specified if the focus is on specific themes.
- Aspect-Based Sentiment: Sentiment score for each theme.
- Suggestions: Suggestions or remediations mentioned in the feedback.
This step is critical to the solution, and the foundation for the three steps that follow. Extracting all relevant insights from each comment can be time-consuming if there are many comments. Leveraging the Batch API can expedite this process. Once the individual comments are processed, these insights can be utilized multiple times to generate actionable analytics and meaningful reports.
2: Merge the Individual Comment’s JSON Files
Next, we merge the summaries, themes, and suggestions from each comment’s JSON file into three separate files:
- Merged Summary
- Merged Themes
- Merged Suggestions
This allows us to extract insights from each segment separately, such as identifying popular themes or suggestions.
3: Generate Aggregated Insights from Merged Files
We then generate final aggregated outputs:
- Aggregated Summary: A comprehensive summary of all comments.
- Aggregated Themes: Consolidation of themes to generate the Top 25 most frequently occurring themes. We also explore other options such as categorizing themes for easier consumption and occurrence count. Note that the occurrence count may not be effective if there are too many themes.
- Aggregated Suggestions: A consolidated list of suggestions for improvement.
4: Generate Final Consolidated Report (Executive Summary)
Finally, we combine all aggregated outputs into a Consolidated Report, which includes:
- Summaries
- Themes
- Suggestions
Upcoming Updates
We plan to add the following updates to enhance the solution further:
- Use JSON mode to generate JSON files and structured outputs to create text based on a predefined schema.
- Utilize the Batch API to generate individual comment JSON files efficiently.
- Leverage Managed Identity to connect to various Azure Services and use Key Vault to store secrets securely.
- Incorporate Azure Document Intelligence to extract text and sections from PDF files. Alternatively, we can use PyMuPDF to extract text from PDF files, as it is also adding support for chunking for LLM use cases.
Get Started
You can leverage the Comment Analytics solution and extend it to meet your requirements to make informed, data-driven decisions that align with stakeholder expectations and drive operational excellence.
You can get started with solution today by accessing the code repository on GitHub: AllAboutUnstructuredData/CommentAnalytics/README.md at main · smallangi/AllAboutUnstructuredData · GitHub
0 comments
Be the first to start the discussion.