CocoIndex Setting
Certain settings need to be provided for CocoIndex to work, e.g. database connections, app namespace, etc.
Configure CocoIndex Settings
Note that in general, you have two ways to launch CocoIndex:
- Call CocoIndex APIs from your own Python application or library.
- Use Cocoindex CLI. It's handy for most routine indexing building and management tasks.
CocoIndex exposes process-level settings specified by cocoindex.Settings
dataclass.
Settings can be configured in three different ways.
In the following sections, the later ones will override the earlier ones.
Environment Variables
The simplest approach is to set corresponding environment variables. See List of Environment Variables for specific environment variables.
You can consider place a .env
file in your directory.
The CLI will load environment variables from the .env
file (see CLI for more details).
From your own main module, you can also load environment variables with a package like python-dotenv
.
Setting Function
A more flexible approach is to provide a setting function that returns a cocoindex.Settings
dataclass object.
The setting function can have any name, and needs to be decorated with the @cocoindex.settings
decorator, for example:
@cocoindex.settings
def cocoindex_settings() -> cocoindex.Settings:
return cocoindex.Settings(
database=cocoindex.DatabaseConnectionSpec(
url="postgres://cocoindex:cocoindex@localhost/cocoindex"
)
)
This setting function will be called once when CocoIndex is initialized. Once the settings function is provided, environment variables will be ignored.
cocoindex.init()
function
You can also call cocoindex.init()
with a cocoindex.Settings
dataclass object as argument, for example:
cocoindex.init(
cocoindex.Settings(
database=cocoindex.DatabaseConnectionSpec(
url="postgres://cocoindex:cocoindex@localhost/cocoindex"
)
)
)
For example, you can call it in the main function of your application.
Once the cocoindex.init()
is called with a cocoindex.Settings
dataclass object as argument, the @cocoindex.settings
function and environment variables will be ignored.
This is more flexible, as you can more easily construct cocoindex.Settings
based on other stuffs you loaded earlier.
But be careful that if you call cocoindex.init()
only under the path of main (e.g. within if __name__ == "__main__":
guard), it won't be executed when you're using CocoIndex CLI, as it won't execute your main logic.
cocoindex.init()
is optional:
-
You can call
cocoindex.init()
with acocoindex.Settings
dataclass object as argument, or without any argument. When without argument, the settings will be loaded from the@cocoindex.settings
function or environment variables. -
You don't have to explicitly call
cocoindex.init()
. CocoIndex will be automatically initialized when needed, e.g. when any method of any flow is called the first time. But callingcocoindex.init()
explicitly (usually at startup time, e.g. in the main function of your application) has the benefit of making sure CocoIndex library is initialized and any potential exceptions are raised earlier before proceeding with the application. If you need this clarity, you can call it explicitly even if you don't want to provide settings by thecocoindex.init()
call.
List of Settings
cocoindex.Settings
is a dataclass that contains the following fields:
app_namespace
(type:str
, required): The namespace of the application.database
(type:DatabaseConnectionSpec
, required): The connection to the Postgres database.global_execution_options
(type:GlobalExecutionOptions
, optional): The global execution options shared by all flows.
App Namespace
The app_namespace
field helps organize flows across different environments (e.g., dev, staging, production), team members, etc. When set, it prefixes flow names with the namespace.
For example, if the namespace is Staging
, for a flow with name specified as Flow1
in code, the full name of the flow will be Staging.Flow1
.
You can also get the current app namespace by calling cocoindex.get_app_namespace()
(see Getting App Namespace for more details).
If not set, all flows are in a default unnamed namespace.
Environment variable: COCOINDEX_APP_NAMESPACE
DatabaseConnectionSpec
DatabaseConnectionSpec
configures the connection to a database. Only Postgres is supported for now. It has the following fields:
-
url
(type:str
): The URL of the Postgres database to use as the internal storage, e.g.postgres://cocoindex:cocoindex@localhost/cocoindex
.Environment variable for
Settings.database.url
:COCOINDEX_DATABASE_URL
-
user
(type:Optional[str]
, default:None
): The username for the Postgres database. If not provided, username will come fromurl
.Environment variable for
Settings.database.user
:COCOINDEX_DATABASE_USER
-
password
(type:Optional[str]
, default:None
): The password for the Postgres database. If not provided, password will come fromurl
.Environment variable for
Settings.database.password
:COCOINDEX_DATABASE_PASSWORD
tipPlease be careful that all values in
url
needs to be url-encoded if they contain special characters. For this reason, prefer to use the separateduser
andpassword
fields for username and password. -
max_connections
(type:int
, default:25
): The maximum number of connections to keep in the pool.Environment variable for
Settings.database.max_connections
:COCOINDEX_DATABASE_MAX_CONNECTIONS
-
min_connections
(type:int
, default:5
): The minimum number of connections to keep in the pool.Environment variable for
Settings.database.min_connections
:COCOINDEX_DATABASE_MIN_CONNECTIONS
If you use the Postgres database hosted by Supabase, please click Connect on your project dashboard and find the following URL:
- If you're on a IPv6 network, use the URL under Direct connection. You can visit IPv6 test to see if you have IPv6 Internet connection.
- Otherwise, use the URL under Session pooler.
Note that Supabase has a pool size limit of 15 by default, while CocoIndex's default
max_connections
value is 25. You can adjust either value to make sure Supabase's pool size limit is greater than CocoIndex'smax_connections
value. Supabase's pool size limit can be adjusted under "Database" -> "Settings". - CocoIndex doesn't support Transaction pooler now.
GlobalExecutionOptions
GlobalExecutionOptions
is used to configure the global execution options shared by all flows. It has the following fields:
source_max_inflight_rows
(type:int | None
, default:1024
): The maximum number of concurrent inflight rows for all source operations.source_max_inflight_bytes
(type:int | None
, default:None
): The maximum number of concurrent inflight bytes for all source operations.
See also flow definition docs about why it's necessary to control processing concurrency, and how to configure it on per-source basis. If both global and per-source limits are specified, both need to be satisfied to admit additional source rows.
List of Environment Variables
This is the list of environment variables, each of which has a corresponding field in Settings
:
environment variable | corresponding field in Settings | required? |
---|---|---|
COCOINDEX_APP_NAMESPACE | app_namespace | No |
COCOINDEX_DATABASE_URL | database.url | Yes |
COCOINDEX_DATABASE_USER | database.user | No |
COCOINDEX_DATABASE_PASSWORD | database.password | No |
COCOINDEX_DATABASE_MAX_CONNECTIONS | database.max_connections | No (default: 25 ) |
COCOINDEX_DATABASE_MIN_CONNECTIONS | database.min_connections | No (default: 5 ) |
COCOINDEX_SOURCE_MAX_INFLIGHT_ROWS | global_execution_options.source_max_inflight_rows | No (default: 1024 ) |
COCOINDEX_SOURCE_MAX_INFLIGHT_BYTES | global_execution_options.source_max_inflight_bytes | No |