This is a simple example of using the AI Vector Search feature in Oracle database 23ai.
We are going to use one provided by Oracle called all-MiniLM-L12-v2. This model is explained here. We are going to use it to generate vectors from plain text following this guide from here.

Load the Model:

download and unzip:

[oracle@ol8_23ai ~]$ mkdir -p /u01/models
[oracle@ol8_23ai ~]$ cd /u01/models

[oracle@ol8_23ai models]$ wget https://adwc4pm.objectstorage.us-ashburn-1.oci.customer-oci.com/p/VBRD9P8ZFWkKvnfhrWxkpPe8K03-JIoM5h_8EJyJcpE80c108fuUjg7R5L5O7mMZ/n/adwc4pm/b/OML-Resources/o/all_MiniLM_L12_v2_augmented.zip
...
...
[oracle@ol8_23ai models]$ unzip -q all_MiniLM_L12_v2_augmented.zip


[oracle@ol8_23ai models]$ wget https://adwc4pm.objectstorage.us-ashburn-1.oci.customer-oci.com/p/VBRD9P8ZFWkKvnfhrWxkpPe8K03-JIoM5h_8EJyJcpE80c108fuUjg7R5L5O7mMZ/n/adwc4pm/b/OML-Resources/o/all_MiniLM_L12_v2_augmented.zip

Connect to the PDB, create user, create directory and grant access to the user:

SQL> create user testvector identified by testvector quota unlimited on users;
User created.
SQL> grant create session, connect, db_developer_role, create mining model to testvector;
Grant succeeded.

SQL> create or replace directory model_dir as '/u01/models';
Directory created
SQL> grant read, write on directory model_dir to testvector;
Grant succeeded.
SQL>

Load the model into the database using the DBMS_VECTOR package:

[oracle@ol8_23ai ~]$ sqlplus testvector/testvector@dbtest23pdb1

SQL> show user
USER is "TESTVECTOR"
SQL> begin
  dbms_vector.drop_onnx_model (
    model_name => 'ALL_MINILM_L12_V2',
    force => true);

  dbms_vector.load_onnx_model (
    directory  => 'model_dir',
    file_name  => 'all_MiniLM_L12_v2.onnx',
    model_name => 'ALL_MINILM_L12_V2');
end;
/  2    3    4    5    6    7    8    9   10   11

PL/SQL procedure successfully completed.

SQL>

Check the model information in the USER_MINING_MODELS view:

col model_name format a20
col mining_function format a12
col algorithm format a12
col attribute_name format a20
col data_type format a20
col vector_info format a30
col attribute_type format a20
set lines 120

SELECT model_name, mining_function, algorithm,
algorithm_type, model_size
FROM user_mining_models
ORDER BY model_name;


MODEL_NAME           MINING_FUNCT ALGORITHM    ALGORITHM_ MODEL_SIZE
-------------------- ------------ ------------ ---------- ----------
ALL_MINILM_L12_V2    EMBEDDING    ONNX         NATIVE      133322334



SELECT model_name, attribute_name, attribute_type, data_type, vector_info
FROM user_mining_model_attributes
ORDER BY attribute_name;


MODEL_NAME           ATTRIBUTE_NAME       ATTRIBUTE_TYPE       DATA_TYPE            VECTOR_INFO
-------------------- -------------------- -------------------- -------------------- ------------------------------
ALL_MINILM_L12_V2    DATA                 TEXT                 VARCHAR2
ALL_MINILM_L12_V2    ORA$ONNXTARGET       VECTOR               VECTOR               VECTOR(384,FLOAT32)

Now we are going to get the public data set from here composed by 200.000 newspaper headlines.

[oracle@ol8_23ai ~]$ cd /u01/models/
[oracle@ol8_23ai models]$ wget https://objectstorage.eu-frankfurt-1.oraclecloud.com/n/fro8fl9kuqli/b/AIVECTORS/o/dataset_200K.txt

Create an external table to access the data:

SQL> show user
USER is "TESTVECTOR"
SQL>


SQL> CREATE TABLE if not exists CCNEWS_TMP (sentence VARCHAR2(4000))
  ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY model_dir
                         ACCESS PARAMETERS
                           (RECORDS DELIMITED BY 0x'0A'
                            READSIZE 100000000
                            FIELDS (sentence CHAR(4000)))
                         LOCATION (model_dir:'dataset_200K.txt'))
  PARALLEL
  REJECT LIMIT UNLIMITED;
  
Table created.

SQL>
  
SQL> select count(*) from CCNEWS_TMP;

  COUNT(*)
----------
    200000
SQL>
  

Now that we have the data set and embedding model has been loaded into the database, we can calculate embeddings directly in the database.
Connect to the TESTVECTOR schema, and create a table with a column vec with data type VECTOR:

SQL> show user
USER is "TESTVECTOR"
SQL> create table if not exists CCNEWS (
    id number(10) not null,
    info VARCHAR2(4000),
    vec VECTOR
);  2    3    4    5
Table created.
SQL>

-- Use the ALL_MINILM_L12_V2 previously loaded to calculate the vector embeddings:
SQL> insert into CCNEWS (id, info, vec)
select rownum,
       sentence,
       TO_VECTOR(VECTOR_EMBEDDING(ALL_MINILM_L12_V2 USING sentence as data))
from CCNEWS_TMP;  
200000 rows created.
SQL> commit;
Commit complete.

Now we are going to do a semantic search using the VECTOR_DISTANCE function:

col info format a90
set lines 120
set pagesize 99

select id, info
from CCNEWS
order by vector_distance(vec, TO_VECTOR(VECTOR_EMBEDDING(ALL_MINILM_L12_V2 USING 'sad news about dogs' as data)), COSINE)
fetch approx first 5 rows only;


        ID INFO
---------- ------------------------------------------------------------------------------------------
    124451 Karl Flores, secretary, Maltese National, Canine Federation, Hamrun. At least the two dog
           lovers now conclusively agree with me that the breed is in danger.

    111521  reality show a dog
    163246 Vivi the whippet, the prize-winning show dog who vanished two years ago at Kennedy Interna
           tional Airport, is alive and well. At least that\u00e2\u0080\u0099s what the animal psychi
           cs say.

    119724 Ellen DeGeneres' talk show was put on hold for a day because of her emotionally wrenching
           dog-adoption drama. \

      3864 Dogs have earned a new claim to the title of man's best friend following the discovery
            by scientists that they can sniff out cancer and even detect unconfirmed cases.

SQL>


select id, info
from CCNEWS
order by vector_distance(vec, TO_VECTOR(VECTOR_EMBEDDING(ALL_MINILM_L12_V2 USING 'interesting news about rockets' as data)), COSINE)
fetch approx first 5 rows only;

        ID INFO
---------- ------------------------------------------------------------------------------------------
     52294 Fighters in the Gaza Strip are proud to  show off the latest addition to their arsenal of
           homemade  rockets -- the \

      3194 America's new heavy-lifting rocket takes its much-anticipated test flight Saturday, if
            Mother Nature cooperates. The Boeing Delta 4-Heavy rocket was supposed to launch today, b
           ut the forecast for strong winds

     26779 WASHINGTON NASA's ready to roll out an improved rocket fuel tank designed to prevent p
           roblems that helped destroy the space shuttle Columbia.

     52066 MOJAVE, Calif. - A stubby rocket plane powered its rockets and appeared to blast through t
           he Earth's atmosphere for a second time in two weeks Monday to capture a $10 million prize
            meant to encourage space tourism...

    152239 Spectacular failure of Zenit rocket The launch of a communications satellite from a conver
           ted oil rig in the Pacific ended in a fireball on Tuesday when the rocket exploded on the
           launch pad, New Scientist reports.\u00e2\u0080\u00a6


SQL>

That’s it, this is an example of this new feature, but to be honest, many things I don’t understand as this is new to me….

Last modified: 15 October 2024

Author

Comments

Write a Reply or Comment

Your email address will not be published.