Skip to main content

CocoIndex Setting

Certain settings need to be provided for CocoIndex to work, e.g. database connections, app namespace, etc.

Configure CocoIndex Settings

Note that in general, you have two ways to launch CocoIndex:

  • Call CocoIndex APIs from your own Python application or library.
  • Use Cocoindex CLI. It's handy for most routine indexing building and management tasks.

CocoIndex exposes process-level settings specified by cocoindex.Settings dataclass. Settings can be configured in three different ways. In the following sections, the later ones will override the earlier ones.

Environment Variables

The simplest approach is to set corresponding environment variables. See List of Environment Variables for specific environment variables.

tip

You can consider place a .env file in your directory. The CLI will load environment variables from the .env file (see CLI for more details). From your own main module, you can also load environment variables with a package like python-dotenv.

Setting Function

A more flexible approach is to provide a setting function that returns a cocoindex.Settings dataclass object. The setting function can have any name, and needs to be decorated with the @cocoindex.settings decorator, for example:

@cocoindex.settings
def cocoindex_settings() -> cocoindex.Settings:
return cocoindex.Settings(
database=cocoindex.DatabaseConnectionSpec(
url="postgres://cocoindex:cocoindex@localhost/cocoindex"
)
)

This setting function will be called once when CocoIndex is initialized. Once the settings function is provided, environment variables will be ignored.

cocoindex.init() function

You can also call cocoindex.init() with a cocoindex.Settings dataclass object as argument, for example:

cocoindex.init(
cocoindex.Settings(
database=cocoindex.DatabaseConnectionSpec(
url="postgres://cocoindex:cocoindex@localhost/cocoindex"
)
)
)

For example, you can call it in the main function of your application. Once the cocoindex.init() is called with a cocoindex.Settings dataclass object as argument, the @cocoindex.settings function and environment variables will be ignored.

This is more flexible, as you can more easily construct cocoindex.Settings based on other stuffs you loaded earlier. But be careful that if you call cocoindex.init() only under the path of main (e.g. within if __name__ == "__main__": guard), it won't be executed when you're using CocoIndex CLI, as it won't execute your main logic.

info

cocoindex.init() is optional:

  • You can call cocoindex.init() with a cocoindex.Settings dataclass object as argument, or without any argument. When without argument, the settings will be loaded from the @cocoindex.settings function or environment variables.

  • You don't have to explicitly call cocoindex.init(). CocoIndex will be automatically initialized when needed, e.g. when any method of any flow is called the first time. But calling cocoindex.init() explicitly (usually at startup time, e.g. in the main function of your application) has the benefit of making sure CocoIndex library is initialized and any potential exceptions are raised earlier before proceeding with the application. If you need this clarity, you can call it explicitly even if you don't want to provide settings by the cocoindex.init() call.

List of Settings

cocoindex.Settings is a dataclass that contains the following fields:

  • app_namespace (type: str, required): The namespace of the application.
  • database (type: DatabaseConnectionSpec, required): The connection to the Postgres database.
  • global_execution_options (type: GlobalExecutionOptions, optional): The global execution options shared by all flows.

App Namespace

The app_namespace field helps organize flows across different environments (e.g., dev, staging, production), team members, etc. When set, it prefixes flow names with the namespace.

For example, if the namespace is Staging, for a flow with name specified as Flow1 in code, the full name of the flow will be Staging.Flow1. You can also get the current app namespace by calling cocoindex.get_app_namespace() (see Getting App Namespace for more details).

If not set, all flows are in a default unnamed namespace.

Environment variable: COCOINDEX_APP_NAMESPACE

DatabaseConnectionSpec

DatabaseConnectionSpec configures the connection to a database. Only Postgres is supported for now. It has the following fields:

  • url (type: str): The URL of the Postgres database to use as the internal storage, e.g. postgres://cocoindex:cocoindex@localhost/cocoindex.

    Environment variable for Settings.database.url: COCOINDEX_DATABASE_URL

  • user (type: Optional[str], default: None): The username for the Postgres database. If not provided, username will come from url.

    Environment variable for Settings.database.user: COCOINDEX_DATABASE_USER

  • password (type: Optional[str], default: None): The password for the Postgres database. If not provided, password will come from url.

    Environment variable for Settings.database.password: COCOINDEX_DATABASE_PASSWORD

    tip

    Please be careful that all values in url needs to be url-encoded if they contain special characters. For this reason, prefer to use the separated user and password fields for username and password.

  • max_connections (type: int, default: 25): The maximum number of connections to keep in the pool.

    Environment variable for Settings.database.max_connections: COCOINDEX_DATABASE_MAX_CONNECTIONS

  • min_connections (type: int, default: 5): The minimum number of connections to keep in the pool.

    Environment variable for Settings.database.min_connections: COCOINDEX_DATABASE_MIN_CONNECTIONS

info

If you use the Postgres database hosted by Supabase, please click Connect on your project dashboard and find the following URL:

  • If you're on a IPv6 network, use the URL under Direct connection. You can visit IPv6 test to see if you have IPv6 Internet connection.
  • Otherwise, use the URL under Session pooler. Note that Supabase has a pool size limit of 15 by default, while CocoIndex's default max_connections value is 25. You can adjust either value to make sure Supabase's pool size limit is greater than CocoIndex's max_connections value. Supabase's pool size limit can be adjusted under "Database" -> "Settings".
  • CocoIndex doesn't support Transaction pooler now.

GlobalExecutionOptions

GlobalExecutionOptions is used to configure the global execution options shared by all flows. It has the following fields:

  • source_max_inflight_rows (type: int | None, default: 1024): The maximum number of concurrent inflight rows for all source operations.
  • source_max_inflight_bytes (type: int | None, default: None): The maximum number of concurrent inflight bytes for all source operations.

See also flow definition docs about why it's necessary to control processing concurrency, and how to configure it on per-source basis. If both global and per-source limits are specified, both need to be satisfied to admit additional source rows.

List of Environment Variables

This is the list of environment variables, each of which has a corresponding field in Settings:

environment variablecorresponding field in Settingsrequired?
COCOINDEX_APP_NAMESPACEapp_namespaceNo
COCOINDEX_DATABASE_URLdatabase.urlYes
COCOINDEX_DATABASE_USERdatabase.userNo
COCOINDEX_DATABASE_PASSWORDdatabase.passwordNo
COCOINDEX_DATABASE_MAX_CONNECTIONSdatabase.max_connectionsNo (default: 25)
COCOINDEX_DATABASE_MIN_CONNECTIONSdatabase.min_connectionsNo (default: 5)
COCOINDEX_SOURCE_MAX_INFLIGHT_ROWSglobal_execution_options.source_max_inflight_rowsNo (default: 1024)
COCOINDEX_SOURCE_MAX_INFLIGHT_BYTESglobal_execution_options.source_max_inflight_bytesNo