Executive Summary
When integrating SharePoint content into downstream systems such as Azure AI Search, RAG pipelines, or Copilot extensions, preserving document-level access control is critical—especially for highly sensitive content.
This post describes a security-first architecture that propagates SharePoint document permissions into downstream systems so that authorization is enforced at query time. The approach:
- Uses Microsoft Graph’s Sites.Selected permission for least-privilege access to SharePoint
- Materializes document-level permissions into search index fields at ingestion time
- Relies on Microsoft Entra ID object IDs (GUIDs) for stable, query-time filtering
Introduction: The Problem
In our last project, the customer needed to generate tailored briefings for multiple user groups. Creating those briefings required preprocessing a large volume of documents stored in SharePoint folders—many containing highly sensitive data. That meant we had to carry SharePoint’s document-level permissions all the way through downstream search and retrieval systems so only authorized users could see the right content.
SharePoint provides rich, hierarchical, and inheritable permissions—but downstream systems like Blob Storage, search indexes, and LLM retrieval layers do not natively understand SharePoint ACLs.
Without an explicit permission-mapping strategy, organizations risk:
- Overexposing sensitive documents to unauthorized users
- Violating Zero Trust principles by granting broad access
- Failing internal security or compliance reviews
A common anti-pattern is granting applications Sites.Read.All, which unintentionally exposes all sites in a tenant. We needed a pattern that preserves document-level authorization information when we ingest content into downstream systems.
The Journey: Our Approach and Solution
We built a security-first pipeline that reads documents and permissions from SharePoint, normalizes identities to Microsoft Entra ID object IDs, and stores both content and ACL metadata in a search index. The ingestion app uses the Microsoft Graph Sites.Selected permission; everything downstream operates on the materialized permission data.
Design Goals
| Goal | Description |
|---|---|
| ✅ Least-privilege access | Explicit allow-listing per site—no tenant-wide permissions |
| ✅ Document-level ACLs | Materialize permissions so downstream systems can filter safely |
| ✅ Deterministic identities | Use GUIDs instead of emails or UPNs |
| ✅ Broad compatibility | Work with Copilot extensions, RAG retrievers, and search indexes |
| ✅ Highly sensitive content | Safe for regulated documents and compliance scenarios |
Architecture Overview
The solution comprises five key components working together:
| Component | Role |
|---|---|
| SharePoint Online | Hosts documents and source permissions |
| Microsoft Graph API | Supplies document metadata and ACLs |
| Sites.Selected permission | Ensures zero default access; each site is explicitly granted |
| Ingestion pipeline | Reads documents and permissions, resolves effective ACLs, and normalizes identities |
| Search index with security trimming | Stores allowedUsers and allowedGroups for query-time filters |
End-to-End Permission Flow

The Destination: Outcomes and Learnings
After piloting this pattern in production workloads, here’s what held up.
Permission Scoping with Sites.Selected
As described above, the ingestion application—the component that reads documents and their metadata from SharePoint and writes them into the search index—is registered with the Sites.Selected application permission. This means:
| Characteristic | Benefit |
|---|---|
| Zero access by default | The app cannot read any site until explicitly granted |
| Explicit site grants | SharePoint admins must allow each site individually |
| Enforced by SharePoint | Access control is platform-enforced, not app logic |
| Clear audit trail | Every grant is traceable and revocable |
This sharply limits blast radius if the app is ever compromised and keeps the pattern aligned with Zero Trust principles.
Extract and Normalize Permissions via Microsoft Graph
For each document, effective permissions are retrieved using Microsoft Graph. The key challenge is that SharePoint permissions are hierarchical and inheritable, whereas downstream systems are ACL-agnostic. To bridge this gap, permissions must be resolved at ingestion time and stored explicitly.
GET /sites/{site-id}/drive/items/{item-id}/permissions
From the response, we extract:
- User assignments — individual users with access
- Group assignments — security groups and Microsoft 365 groups
- Identity normalization — convert all identities to Microsoft Entra ID object IDs (GUIDs)
permissions = await graph_client.get_permissions(drive_id, item_id)
allowed_users = []
allowed_groups = []
for entry in permissions:
grant = entry.get("grantedToV2", {})
user = grant.get("user")
group = grant.get("group")
if user and user.get("id"):
allowed_users.append(user["id"]) # Microsoft Entra ID object ID
if group and group.get("id"):
allowed_groups.append(group["id"]) # Group object ID
Key decisions:
| Decision | Rationale |
|---|---|
| Use GUIDs, not emails | Object IDs remain stable across renames and domain changes |
| Preserve group IDs | Enables offline expansion when search can’t expand groups natively |
| Resolve inheritance once | Index holds the effective ACL for each document |
Index Security Materialization
Permissions are stored directly in the search index as filterable fields:
chunk = {
"content": document_text,
"allowedUsers": allowed_users,
"allowedGroups": ",".join(allowed_groups),
}
At query time, results are filtered using the authenticated user’s Microsoft Entra ID object ID and group memberships:
allowedUsers/any(u: u eq '{user_oid}') or allowedGroups/any(g: g eq '{group_oid}')
Benefits:
- ✅ Authorization runs before documents are returned to the caller
- ✅ Same pattern works for RAG retrievers and Copilot extensions
- ✅ No post-retrieval filtering required—secure by design
Security Characteristics
This architecture delivers strong security guarantees:
| Property | Status |
|---|---|
| No tenant-wide content access | ✅ |
| Explicit site allow-listing | ✅ |
| Deterministic identity model | ✅ |
| Secure by default for AI search and RAG | ✅ |
| Compatible with Copilot extensions | ✅ |
Limitation: Stale Permissions
One important trade-off of materializing permissions at ingestion time is that permission changes in SharePoint are not automatically propagated to downstream systems. If a user’s access is revoked in SharePoint, that user may still see the document’s content in the search index or RAG pipeline until the next ingestion run.
To mitigate this:
- Run the ingestion pipeline on a regular schedule so that permission updates are picked up in a timely manner.
- Use SharePoint webhooks or event receivers to trigger re-ingestion when permissions change, reducing the staleness window.
- Tune the refresh interval based on the sensitivity of the content—highly regulated data may warrant more frequent re-ingestion.
- Communicate the expected propagation delay to stakeholders so they understand the security posture.
In our engagement, the ingestion pipeline ran on a periodic schedule, and the customer accepted a bounded delay for permission propagation given the sensitivity profile of the content.
Common Pitfalls to Avoid
| Pitfall | Why It’s Dangerous |
|---|---|
❌ Using Sites.Read.All |
Breaks least privilege—exposes entire tenant |
| ❌ Filtering results after retrieval | Data already leaked to the application layer |
| ❌ Relying on emails or display names | These change; GUIDs don’t |
| ❌ Ignoring group expansion | Unexpanded groups lead to silent overexposure |
| ❌ Assuming real-time permission sync | Materialized permissions can become stale—plan for periodic refresh |
Conclusion
Our customer needed to expose SharePoint content through AI-powered search and Copilot integrations without compromising document-level access control. By materializing permissions at ingestion time and filtering at query time, we delivered a solution that preserved the customer’s existing SharePoint security model across every downstream system.
When SharePoint content is integrated into AI-powered systems, authorization becomes a data problem. Treating permissions as first-class data ensures the system remains secure, auditable, and future-proof.
Key takeaways:
- Use Sites.Selected for least-privilege access when the ingestion application reads from SharePoint
- Document-level permissions must be materialized explicitly in your downstream index
- GUID-based filtering ensures stable, deterministic identity matching
- Authorization should always happen before retrieval, not after
- Plan for periodic re-ingestion to keep materialized permissions in sync with SharePoint
This pattern has proven effective for enterprise-grade search, RAG pipelines, and Copilot integrations operating over highly sensitive documents.