DB Setup
We show how to create an Astra DB instance and set the connection details and secrets that all code examples can then use.
Run locally or use Colab?
You can run most of the examples either locally on your environment or taking advantage of Google Colaboratory, it's up to you.
Just keep in mind that each Colab instance is ephemeral: as such, to run each notebook there you will have to repeat a modicum of setup such as installing dependencies and supplying DB connection parameters and API Keys, whereas if you configure a local environment once you will be able to run all notebooks straight away.
Astra DB is a serverless DBaaS by DataStax, built on Apache Cassandra, that offers a free tier with generous traffic and storage limits. Using Astra DB frees you from the hassle of running your own cluster while retaining all the advantages, such as the excellent data distribution and very high availability that make Cassandra a world-class NoSQL database.
Self-managed Cassandra alternative
Nothing prevents you from adapting the examples to any Cassandra cluster:
in most cases all you have to do is to build the database Session
object
differently, and that's it. Inspect the code to find out: generally it's
just a couple of lines to change.
Create your Astra DB instance
Go to astra.datastax.com, sign up and create an Astra DB database in the Free Tier -- in the following we assume you called it
cassio_db
.
You will be asked for a "Keyspace name" when creating the database:
you can call it something like cassio_tutorials
for example.
Detailed explanations can be found at this page.
Vector Search
If you want to take advantage of the Vector Search capabilities that are currently in Public Preview in Astra, make sure to pick the "Serverless with Vector (Preview)" option when creating your database.
If you choose to use an Astra database without Vector Search support, you can still test the corresponding components of CassIO against a Cassandra instance with Vector Search enabled. See here for instructions on how to build and run one such instance locally.
Get Token and Secure Connect Bundle
In order to establish a connection to your Cloud Database, you need a secret string ("Token") and a "Secure Connect Bundle" zipfile, containing certificates/proxy/routing information.
- how to generate a token (remember to pick the "Database Administrator role);
- how to obtain the Secure Connect Bundle for the database. Store it somewhere on your computer (without unpacking the zip).
Colab users: you're done
If you plan to run the examples within Google Colab, that's all you need to do now: you can skip the rest of the page and move on to API Setup.
Note, however, that certain notebooks may occasionally require further setup steps (we'll make that clear along the way).
Create .env
file
Now you need to create a .env
file, at the root of this repo,
with a few connection parameters
for the code examples to be able to access your database.
You can copy the .env.template
file you find there and replace
the environment variables you see there.
(essentially, the full path to the bundle file, the keyspace
name and your database secret token string).
Using the Astra CLI
Alternatively, once you have the Database Administrator token, you can use the Astra CLI to automate the rest (bundle and env-file).
First install Astra CLI.
Then configure it with:
astra setup
providing the token (the string starting with AstraCS:...
).
Then, in the root directory of this repo, adjusting names if needed, launch
astra db create-dotenv cassio_db -k cassio_tutorials
This will download the bundle zipfile and create a .env
file
with all connection parameters you'll need later.
Import sample data
Some of the provided code examples require pre-existing data on your database.
To populate the newly-created keyspace with the required data:
- clone this repo if you haven't already (it contains the website content, but also all code examples for you);
- download the newest (vector-search-compatible)
cqlsh
utility from this link; - extract the archive to a location of your liking, e.g.
/home/user/myCqlsh
; - source the environment file you just prepared with
. .env
(make sure you are in this repo's root directory); cd
to this repo's root directory and launch the script that populates the database:
/home/user/myCqlsh/cqlsh-astra/bin/cqlsh \
-b "$ASTRA_DB_SECURE_BUNDLE_PATH" \
-u token \
-p "$ASTRA_DB_APPLICATION_TOKEN" \
-k "$ASTRA_DB_KEYSPACE" \
-f setup/provision_db/write_sample_data.cql
Astra DB's in-browser console
You can also run this step without a local cqlsh
client (for instance,
if you are using Colab and want to skip local setup entirely).
Locate the CQL Console in you Astra DB instance, then:
- enter the command
USE cassio_tutorials;
and press Enter. Replace with your keyspace name if you called it differently. - Paste the contents of this file in the Console and watch the show.
Note
If you target your own Cassandra cluster, make sure you USE
your
keyspace before running the script above.
Your database is ready
The next step is to configure the necessary API Keys.