AzureBlob
The AzureBlob
source imports files from Azure Blob Storage.
Setup for Azure Blob Storage
Get Started
If you didn't have experience with Azure Blob Storage, you can refer to the quickstart. These are actions you need to take:
- Create a storage account in the Azure Portal.
- Create a container in the storage account.
- Upload your files to the container.
- Grant the user / identity / service principal (depends on your authentication method, see below) access to the storage account. At minimum, a Storage Blob Data Reader role is needed. See this doc for reference.
Authentication
We support the following authentication methods:
-
Shared access signature (SAS) tokens. You can generate it from the Azure Portal in the settings for a specific container. You need to provide at least List and Read permissions when generating the SAS token. It's a query string in the form of
sp=rl&st=2025-07-20T09:33:00Z&se=2025-07-19T09:48:53Z&sv=2024-11-04&sr=c&sig=i3FDjsadfklj3%23adsfkk
. -
Storage account access key. You can find it in the Azure Portal in the settings for a specific storage account.
-
Default credential. When none of the above is provided, it will use the default credential.
This allows you to connect to Azure services without putting any secrets in the code or flow spec. It automatically chooses the best authentication method based on your environment:
-
On your local machine: uses your Azure CLI login (
az login
) or environment variables.az login
# Optional: Set a default subscription if you have more than one
az account set --subscription "<YOUR_SUBSCRIPTION_NAME_OR_ID>" -
In Azure (VM, App Service, AKS, etc.): uses the resource’s Managed Identity.
-
In automated environments: supports Service Principals via environment variables
AZURE_CLIENT_ID
AZURE_TENANT_ID
AZURE_CLIENT_SECRET
-
You can refer to this doc for more details.
Spec
The spec takes the following fields:
-
account_name
(str
): the name of the storage account. -
container_name
(str
): the name of the container. -
prefix
(str
, optional): if provided, only files with path starting with this prefix will be imported. -
binary
(bool
, optional): whether reading files as binary (instead of text). -
included_patterns
(list[str]
, optional): a list of glob patterns to include files, e.g.["*.txt", "docs/**/*.md"]
. If not specified, all files will be included. -
excluded_patterns
(list[str]
, optional): a list of glob patterns to exclude files, e.g.["*.tmp", "**/*.log"]
. Any file or directory matching these patterns will be excluded even if they matchincluded_patterns
. If not specified, no files will be excluded. -
sas_token
(cocoindex.TransientAuthEntryReference[str]
, optional): a SAS token for authentication. -
account_access_key
(cocoindex.TransientAuthEntryReference[str]
, optional): an account access key for authentication.infoincluded_patterns
andexcluded_patterns
are using Unix-style glob syntax. See globset syntax for the details.
Schema
The output is a KTable with the following sub fields:
filename
(Str, key): the filename of the file, including the path, relative to the root directory, e.g."dir1/file1.md"
.content
(Str ifbinary
isFalse
, otherwise Bytes): the content of the file.