This is a simple example of using the AI Vector Search feature in Oracle database 23ai.
We are going to use one provided by Oracle called all-MiniLM-L12-v2. This model is explained here. We are going to use it to generate vectors from plain text following this guide from here.
Load the Model:
download and unzip:
[oracle@ol8_23ai ~]$ mkdir -p /u01/models
[oracle@ol8_23ai ~]$ cd /u01/models
[oracle@ol8_23ai models]$ wget https://adwc4pm.objectstorage.us-ashburn-1.oci.customer-oci.com/p/VBRD9P8ZFWkKvnfhrWxkpPe8K03-JIoM5h_8EJyJcpE80c108fuUjg7R5L5O7mMZ/n/adwc4pm/b/OML-Resources/o/all_MiniLM_L12_v2_augmented.zip
...
...
[oracle@ol8_23ai models]$ unzip -q all_MiniLM_L12_v2_augmented.zip
[oracle@ol8_23ai models]$ wget https://adwc4pm.objectstorage.us-ashburn-1.oci.customer-oci.com/p/VBRD9P8ZFWkKvnfhrWxkpPe8K03-JIoM5h_8EJyJcpE80c108fuUjg7R5L5O7mMZ/n/adwc4pm/b/OML-Resources/o/all_MiniLM_L12_v2_augmented.zip
Connect to the PDB, create user, create directory and grant access to the user:
SQL> create user testvector identified by testvector quota unlimited on users;
User created.
SQL> grant create session, connect, db_developer_role, create mining model to testvector;
Grant succeeded.
SQL> create or replace directory model_dir as '/u01/models';
Directory created
SQL> grant read, write on directory model_dir to testvector;
Grant succeeded.
SQL>
Load the model into the database using the DBMS_VECTOR
package:
[oracle@ol8_23ai ~]$ sqlplus testvector/testvector@dbtest23pdb1
SQL> show user
USER is "TESTVECTOR"
SQL> begin
dbms_vector.drop_onnx_model (
model_name => 'ALL_MINILM_L12_V2',
force => true);
dbms_vector.load_onnx_model (
directory => 'model_dir',
file_name => 'all_MiniLM_L12_v2.onnx',
model_name => 'ALL_MINILM_L12_V2');
end;
/ 2 3 4 5 6 7 8 9 10 11
PL/SQL procedure successfully completed.
SQL>
Check the model information in the USER_MINING_MODELS view:
col model_name format a20
col mining_function format a12
col algorithm format a12
col attribute_name format a20
col data_type format a20
col vector_info format a30
col attribute_type format a20
set lines 120
SELECT model_name, mining_function, algorithm,
algorithm_type, model_size
FROM user_mining_models
ORDER BY model_name;
MODEL_NAME MINING_FUNCT ALGORITHM ALGORITHM_ MODEL_SIZE
-------------------- ------------ ------------ ---------- ----------
ALL_MINILM_L12_V2 EMBEDDING ONNX NATIVE 133322334
SELECT model_name, attribute_name, attribute_type, data_type, vector_info
FROM user_mining_model_attributes
ORDER BY attribute_name;
MODEL_NAME ATTRIBUTE_NAME ATTRIBUTE_TYPE DATA_TYPE VECTOR_INFO
-------------------- -------------------- -------------------- -------------------- ------------------------------
ALL_MINILM_L12_V2 DATA TEXT VARCHAR2
ALL_MINILM_L12_V2 ORA$ONNXTARGET VECTOR VECTOR VECTOR(384,FLOAT32)
Now we are going to get the public data set from here composed by 200.000 newspaper headlines.
[oracle@ol8_23ai ~]$ cd /u01/models/
[oracle@ol8_23ai models]$ wget https://objectstorage.eu-frankfurt-1.oraclecloud.com/n/fro8fl9kuqli/b/AIVECTORS/o/dataset_200K.txt
Create an external table to access the data:
SQL> show user
USER is "TESTVECTOR"
SQL>
SQL> CREATE TABLE if not exists CCNEWS_TMP (sentence VARCHAR2(4000))
ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY model_dir
ACCESS PARAMETERS
(RECORDS DELIMITED BY 0x'0A'
READSIZE 100000000
FIELDS (sentence CHAR(4000)))
LOCATION (model_dir:'dataset_200K.txt'))
PARALLEL
REJECT LIMIT UNLIMITED;
Table created.
SQL>
SQL> select count(*) from CCNEWS_TMP;
COUNT(*)
----------
200000
SQL>
Now that we have the data set and embedding model has been loaded into the database, we can calculate embeddings directly in the database.
Connect to the TESTVECTOR schema, and create a table with a column vec with data type VECTOR:
SQL> show user
USER is "TESTVECTOR"
SQL> create table if not exists CCNEWS (
id number(10) not null,
info VARCHAR2(4000),
vec VECTOR
); 2 3 4 5
Table created.
SQL>
-- Use the ALL_MINILM_L12_V2 previously loaded to calculate the vector embeddings:
SQL> insert into CCNEWS (id, info, vec)
select rownum,
sentence,
TO_VECTOR(VECTOR_EMBEDDING(ALL_MINILM_L12_V2 USING sentence as data))
from CCNEWS_TMP;
200000 rows created.
SQL> commit;
Commit complete.
Now we are going to do a semantic search using the VECTOR_DISTANCE function:
col info format a90
set lines 120
set pagesize 99
select id, info
from CCNEWS
order by vector_distance(vec, TO_VECTOR(VECTOR_EMBEDDING(ALL_MINILM_L12_V2 USING 'sad news about dogs' as data)), COSINE)
fetch approx first 5 rows only;
ID INFO
---------- ------------------------------------------------------------------------------------------
124451 Karl Flores, secretary, Maltese National, Canine Federation, Hamrun. At least the two dog
lovers now conclusively agree with me that the breed is in danger.
111521 reality show a dog
163246 Vivi the whippet, the prize-winning show dog who vanished two years ago at Kennedy Interna
tional Airport, is alive and well. At least that\u00e2\u0080\u0099s what the animal psychi
cs say.
119724 Ellen DeGeneres' talk show was put on hold for a day because of her emotionally wrenching
dog-adoption drama. \
3864 Dogs have earned a new claim to the title of man's best friend following the discovery
by scientists that they can sniff out cancer and even detect unconfirmed cases.
SQL>
select id, info
from CCNEWS
order by vector_distance(vec, TO_VECTOR(VECTOR_EMBEDDING(ALL_MINILM_L12_V2 USING 'interesting news about rockets' as data)), COSINE)
fetch approx first 5 rows only;
ID INFO
---------- ------------------------------------------------------------------------------------------
52294 Fighters in the Gaza Strip are proud to show off the latest addition to their arsenal of
homemade rockets -- the \
3194 America's new heavy-lifting rocket takes its much-anticipated test flight Saturday, if
Mother Nature cooperates. The Boeing Delta 4-Heavy rocket was supposed to launch today, b
ut the forecast for strong winds
26779 WASHINGTON NASA's ready to roll out an improved rocket fuel tank designed to prevent p
roblems that helped destroy the space shuttle Columbia.
52066 MOJAVE, Calif. - A stubby rocket plane powered its rockets and appeared to blast through t
he Earth's atmosphere for a second time in two weeks Monday to capture a $10 million prize
meant to encourage space tourism...
152239 Spectacular failure of Zenit rocket The launch of a communications satellite from a conver
ted oil rig in the Pacific ended in a fireball on Tuesday when the rocket exploded on the
launch pad, New Scientist reports.\u00e2\u0080\u00a6
SQL>
That’s it, this is an example of this new feature, but to be honest, many things I don’t understand as this is new to me….
Comments