15 September 2008

Using Data Pump Export(expdp) Over Network

Data Pump Export can be used over Networks. Normally when data pump export utility(expdp) executes, it creates a dump file to the server where the database is installed. You can DIRECTORY parameters in order to specify operating system file path.
Suppose that you have 2 databases(srv1db and srv2db) in 2 different servers(srv1 and srv2). If you run expdp on srv2db, dump file shall be created in srv2 server. You can also run expdp on srv1db and dump file shall be created in srv2 server, with NETWORK_LINK expdp parameter. See demostration below:


--srv1db
SQL> select DIRECTORY_PATH from all_directories where directory_name = 'DATA_PUMP_DIR';

DIRECTORY_PATH
--------------------------------------------------------------------------------
/usr/users/oracle/DP

SQL>


--srv2db
SQL> select DIRECTORY_PATH from all_directories where directory_name = 'DATA_PUMP_DIR';

DIRECTORY_PATH
--------------------------------------------------------------------------------
/home/oracle/DP

SQL>

---srv1
uccs01 oracle@srv1:/usr/users/oracle/DP> ls -ltr


--srv2
bash-3.00$ ls -ltr /home/oracle/DP




--expdp on srv2
bash-3.00$ expdp mte/mte@srv1db tables=t directory=DATA_PUMP_DIR dumpfile=mte_t.dmp logfile=mte_t.log

Export: Release 10.2.0.2.0 - 64bit Production on Wednesday, 13 August, 2008 10:15:53

Copyright (c) 2003, 2005, Oracle. All rights reserved.

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options
Starting "MTE"."SYS_EXPORT_TABLE_01": mte/********@srv1db tables=t directory=DATA_PUMP_DIR dumpfile=mte_t.dmp logfile=mte_t.log
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 64 KB
Processing object type TABLE_EXPORT/TABLE/TABLE
. . exported "MTE"."T" 5.125 KB 31 rows
Master table "MTE"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for MTE.SYS_EXPORT_TABLE_01 is:
/usr/users/oracle/DP/mte_t.dmp
Job "MTE"."SYS_EXPORT_TABLE_01" successfully completed at 10:12:11



---srv1
uccs01 oracle@srv1:/usr/users/oracle/DP> ls -ltr
total 2455
-rw-r----- 1 oracle dba 69632 Aug 13 10:12 mte_t.dmp
-rw-r--r-- 1 oracle dba 981 Aug 13 10:12 mte_t.log
uccs01 oracle@srv1:/usr/users/oracle/DP>


--srv2
bash-3.00$ ls -ltr /home/oracle/DP




--srv2db, create a dblink for srv1db
SQL>create public database link srv1db_LINK.ORC
connect to USER1
using 'srv1db.ORC';


--expdp on srv2
bash-3.00$ expdp USER1/USER1@srv2db tables=t directory=DATA_PUMP_DIR dumpfile=mte_t2.dmp logfile=mte_t2.log network_link=srv1db_link

Export: Release 10.2.0.2.0 - 64bit Production on Wednesday, 13 August, 2008 10:22:09

Copyright (c) 2003, 2005, Oracle. All rights reserved.

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
Starting "USER1"."SYS_EXPORT_TABLE_01": USER1/********@srv2db tables=t directory=DATA_PUMP_DIR dumpfile=mte_t2.dmp logfile=mte_t2.log network_link=srv1db_link
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 64 KB
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
. . exported "USER1"."T" 5.351 KB 16 rows
Master table "USER1"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for USER1.SYS_EXPORT_TABLE_01 is:
/home/oracle/DP/mte_t2.dmp
Job "USER1"."SYS_EXPORT_TABLE_01" successfully completed at 10:22:45

bash-3.00$



---srv1
uccs01 oracle@srv1:/usr/users/oracle/DP> ls -ltr
total 2455
-rw-r----- 1 oracle dba 69632 Aug 13 10:12 mte_t.dmp
-rw-r--r-- 1 oracle dba 981 Aug 13 10:12 mte_t.log
uccs01 oracle@srv1:/usr/users/oracle/DP>


--srv2
bash-3.00$ ls -ltr /home/oracle/DP
-rw-r--r-- 1 oracle oinstall 0 Aug 13 10:22 mte_t2.log
-rw-r----- 1 oracle oinstall 4096 Aug 13 10:22 mte_t2.dmp
bash-3.00$

Workaround for ORA-39014, ORA-12801, ORA-29913

There in an Oracle Bug(5472417 , Note:438608.1) about parallelism mechanism of data pump export. Workaround fort his may be removing parallelism parameter from export(running as a single process)

bash-3.00$ expdp MT/MT full=y directory=DATA_PUMP_DIR dumpfile=orcl_full_imp_p%u.dmp logfile=orcl_full_imp.log parallel=4 network_link=orcl_link

Export: Release 10.2.0.2.0 - 64bit Production on Wednesday, 13 August, 2008 10:49:49

Copyright (c) 2003, 2005, Oracle. All rights reserved.

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
Starting "MT"."SYS_EXPORT_FULL_01": MT/******** full=y directory=DATA_PUMP_DIR dumpfile=orcl_full_imp_p%u.dmp logfile=orcl_full_imp.log parallel=4 network_link=orcl_link
Estimate in progress using BLOCKS method...
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 13.06 GB
Processing object type DATABASE_EXPORT/TABLESPACE
Processing object type DATABASE_EXPORT/PASSWORD_VERIFY_FUNCTION
Processing object type DATABASE_EXPORT/PROFILE
Processing object type DATABASE_EXPORT/SYS_USER/USER
Processing object type DATABASE_EXPORT/SCHEMA/USER
Processing object type DATABASE_EXPORT/ROLE
Processing object type DATABASE_EXPORT/GRANT/SYSTEM_GRANT/PROC_SYSTEM_GRANT
Processing object type DATABASE_EXPORT/SCHEMA/GRANT/SYSTEM_GRANT
Processing object type DATABASE_EXPORT/SCHEMA/ROLE_GRANT
Processing object type DATABASE_EXPORT/SCHEMA/DEFAULT_ROLE
Processing object type DATABASE_EXPORT/SCHEMA/TABLESPACE_QUOTA
Processing object type DATABASE_EXPORT/RESOURCE_COST
Processing object type DATABASE_EXPORT/SCHEMA/DB_LINK
Processing object type DATABASE_EXPORT/TRUSTED_DB_LINK
Processing object type DATABASE_EXPORT/SCHEMA/SEQUENCE/SEQUENCE
Processing object type DATABASE_EXPORT/SCHEMA/SEQUENCE/GRANT/OWNER_GRANT/OBJECT_GRANT
Processing object type DATABASE_EXPORT/DIRECTORY/DIRECTORY
Processing object type DATABASE_EXPORT/DIRECTORY/GRANT/OWNER_GRANT/OBJECT_GRANT
Processing object type DATABASE_EXPORT/DIRECTORY/GRANT/WITH_GRANT_OPTION/OBJECT_GRANT
Processing object type DATABASE_EXPORT/DIRECTORY/GRANT/CROSS_SCHEMA/OBJECT_GRANT
Processing object type DATABASE_EXPORT/CONTEXT
Processing object type DATABASE_EXPORT/SCHEMA/PUBLIC_SYNONYM/SYNONYM
Processing object type DATABASE_EXPORT/SCHEMA/SYNONYM
Processing object type DATABASE_EXPORT/SCHEMA/TYPE/TYPE_SPEC
Processing object type DATABASE_EXPORT/SCHEMA/TYPE/GRANT/OWNER_GRANT/OBJECT_GRANT
Processing object type DATABASE_EXPORT/SYSTEM_PROCOBJACT/PRE_SYSTEM_ACTIONS/PROCACT_SYSTEM
Processing object type DATABASE_EXPORT/SYSTEM_PROCOBJACT/POST_SYSTEM_ACTIONS/PROCACT_SYSTEM
Processing object type DATABASE_EXPORT/SCHEMA/PROCACT_SCHEMA
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/TABLE
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/PRE_TABLE_ACTION
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/GRANT/OWNER_GRANT/OBJECT_GRANT
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/GRANT/CROSS_SCHEMA/OBJECT_GRANT
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/INDEX/INDEX
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/CONSTRAINT/CONSTRAINT
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/INDEX/STATISTICS/INDEX_STATISTICS
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/COMMENT
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/RLS_POLICY
Processing object type DATABASE_EXPORT/SCHEMA/PACKAGE/PACKAGE_SPEC
Processing object type DATABASE_EXPORT/SCHEMA/PACKAGE/GRANT/OWNER_GRANT/OBJECT_GRANT
Processing object type DATABASE_EXPORT/SCHEMA/FUNCTION/FUNCTION
Processing object type DATABASE_EXPORT/SCHEMA/FUNCTION/GRANT/OWNER_GRANT/OBJECT_GRANT
Processing object type DATABASE_EXPORT/SCHEMA/PROCEDURE/PROCEDURE
Processing object type DATABASE_EXPORT/SCHEMA/PROCEDURE/GRANT/OWNER_GRANT/OBJECT_GRANT
Processing object type DATABASE_EXPORT/SCHEMA/PACKAGE/COMPILE_PACKAGE/PACKAGE_SPEC/ALTER_PACKAGE_SPEC
Processing object type DATABASE_EXPORT/SCHEMA/FUNCTION/ALTER_FUNCTION
Processing object type DATABASE_EXPORT/SCHEMA/PROCEDURE/ALTER_PROCEDURE
Processing object type DATABASE_EXPORT/SCHEMA/VIEW/VIEW
Processing object type DATABASE_EXPORT/SCHEMA/VIEW/GRANT/OWNER_GRANT/OBJECT_GRANT
Processing object type DATABASE_EXPORT/SCHEMA/VIEW/GRANT/CROSS_SCHEMA/OBJECT_GRANT
Processing object type DATABASE_EXPORT/SCHEMA/VIEW/COMMENT
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/CONSTRAINT/REF_CONSTRAINT
Processing object type DATABASE_EXPORT/SCHEMA/PACKAGE_BODIES/PACKAGE/PACKAGE_BODY
Processing object type DATABASE_EXPORT/SCHEMA/TYPE/TYPE_BODY
Processing object type DATABASE_EXPORT/SCHEMA/JAVA_CLASS/JAVA_CLASS
Processing object type DATABASE_EXPORT/SCHEMA/JAVA_RESOURCE/JAVA_RESOURCE
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/INDEX/FUNCTIONAL_AND_BITMAP/INDEX
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/INDEX/STATISTICS/FUNCTIONAL_AND_BITMAP/INDEX_STATISTICS
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/STATISTICS/TABLE_STATISTICS
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/POST_TABLE_ACTION
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/TRIGGER
Processing object type DATABASE_EXPORT/SCHEMA/VIEW/TRIGGER
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/POST_INSTANCE/PROCACT_INSTANCE
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/POST_INSTANCE/PROCDEPOBJ
Processing object type DATABASE_EXPORT/SCHEMA/POST_SCHEMA/PROCOBJ
Processing object type DATABASE_EXPORT/SCHEMA/POST_SCHEMA/PROCACT_SCHEMA
ORA-39014: One or more workers have prematurely exited.
ORA-39029: worker 4 with process name "DW04" prematurely terminated
ORA-31671: Worker process DW04 had an unhandled exception.
ORA-12801: error signaled in parallel query server
ORA-29913: error in executing ODCIEXTTABLEPOPULATE callout
ORA-29400: data cartridge error
KUP-04038: internal error: kupax-meta1
ORA-06512: at "SYS.ORACLE_DATAPUMP", line 78
ORA-06512: at "SYS.KUPW$WORKER", line 1345
ORA-06512: at line 2

Job "MT"."SYS_EXPORT_FULL_01" stopped due to fatal error at 11:13:47

Solution of ORA-39002, ORA-39070, ORA-39087

The errors that i mentioned in the subject, can be occured during data pump export/import(expdp/impdp executables). If the directory object does not have necessary rigths, will fire an exception. Giving rights(read - write) will be a solution.

bash-3.00$ expdp mte/mte@test02 tables=t directory=DATA_PUMP_DIR dumpfile=mte_t.dmp logfile=mte_t.log

Export: Release 10.2.0.2.0 - 64bit Production on Wednesday, 13 August, 2008 10:09:48

Copyright (c) 2003, 2005, Oracle. All rights reserved.

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options
ORA-39002: invalid operation
ORA-39070: Unable to open the log file.
ORA-39087: directory name DATA_PUMP_DIR is invalid




>sqlplus
Connected to Oracle Database 10g Enterprise Edition Release 10.2.0.2.0
Connected as HR

SQL> conn mte/mte@test02;
Connected to Oracle Database 10g Enterprise Edition Release 10.2.0.2.0
Connected as mte

SQL> select directory_name from all_directories;

DIRECTORY_NAME
------------------------------
HLR_DIR

SQL> conn HR/HR@test02;
Connected to Oracle Database 10g Enterprise Edition Release 10.2.0.2.0
Connected as HR

SQL> grant read, write on directory DATA_PUMP_DIR to mte;

Grant succeeded

SQL> conn mte/mte@test02;
Connected to Oracle Database 10g Enterprise Edition Release 10.2.0.2.0
Connected as mte

SQL> select directory_name from all_directories;

DIRECTORY_NAME
------------------------------
DATA_PUMP_DIR
HLR_DIR

SQL>



bash-3.00$ expdp mte/mte@test02 tables=t directory=DATA_PUMP_DIR dumpfile=mte_t.dmp logfile=mte_t.log

Export: Release 10.2.0.2.0 - 64bit Production on Wednesday, 13 August, 2008 10:15:53

Copyright (c) 2003, 2005, Oracle. All rights reserved.

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options
Starting "MTE"."SYS_EXPORT_TABLE_01": mte/********@test02 tables=t directory=DATA_PUMP_DIR dumpfile=mte_t.dmp logfile=mte_t.log
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 64 KB
Processing object type TABLE_EXPORT/TABLE/TABLE
. . exported "MTE"."T" 5.125 KB 31 rows
Master table "MTE"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for MTE.SYS_EXPORT_TABLE_01 is:
/usr/users/oracle/DP/mte_t.dmp
Job "MTE"."SYS_EXPORT_TABLE_01" successfully completed at 10:12:11

14 September 2008

An Interesting PL/SQL Bug about References

I have struggled a PL/SQL problem a few days ago. The problem was, as Pl/SQL Compiler says, "PLS-00225: subprogram or cursor 'string' reference is out of scope". It was not as easy as Compiler mentions.
Let me show it by an example:

Suppose that you have 2 packages, PACK1 and PACK2. In PACK2, you have a procedure in PACK1 named PACK1 and another procedure that named PACK1 in PACK2 package. When the namings are established like that you will get "PLS-00225: subprogram or cursor 'string' reference is out of scope" error.

Example below CMP.COMPARE and P_TEST.COMPARE exists. In P_TEST, there is a reference for COMPARE package. It Works fine.

SQL> CREATE OR REPLACE PACKAGE CMP IS
2 TYPE t_HashTable IS TABLE OF VARCHAR2(512) INDEX BY VARCHAR2(32);
3
4 TYPE t_SQL IS RECORD(
5 SQL_STATEMENT VARCHAR2(4000),
6 BIND_LIST t_HashTable
7 );
8
9 END CMP;
10 /

Package created

SQL>
SQL> CREATE OR REPLACE PACKAGE BODY CMP IS
2
3 PROCEDURE Compare IS
4 BEGIN
5 NULL;
6 END Compare;
7
8 END CMP;
9 /

Package body created

SQL>
SQL> CREATE OR REPLACE PACKAGE p_test IS END p_test;
2 /

Package created

SQL> CREATE OR REPLACE PACKAGE BODY p_test IS
2
3 TYPE t_SQLList IS TABLE OF CMP.t_SQL INDEX BY VARCHAR2(16);
4
5 PROCEDURE Compare IS
6 BEGIN
7 NULL;
8 END Compare;
9
10 END p_test;
11 /

Package body created

SQL>



Example2 below COMPARE.COMPARE and P_TEST.COMPARE exists. In P_TEST, there is a reference for COMPARE package. Altough there won't seem any errors, PL/SQL Compiler is angry with code.


SQL> CREATE OR REPLACE PACKAGE COMPARE IS
2 TYPE t_HashTable IS TABLE OF VARCHAR2(512) INDEX BY VARCHAR2(32);
3
4 TYPE t_SQL IS RECORD(
5 SQL_STATEMENT VARCHAR2(4000),
6 BIND_LIST t_HashTable
7 );
8
9 END COMPARE;
10 /

Package created

SQL> CREATE OR REPLACE PACKAGE BODY COMPARE IS
2
3 PROCEDURE Compare IS
4 BEGIN
5 NULL;
6 END Compare;
7
8 END COMPARE;
9 /

Package body created

SQL> CREATE OR REPLACE PACKAGE p_test IS END p_test;
2 /

Package created

SQL> CREATE OR REPLACE PACKAGE BODY p_test IS
2
3 TYPE t_SQLList IS TABLE OF COMPARE.t_SQL INDEX BY VARCHAR2(16);
4
5 PROCEDURE Compare IS
6 BEGIN
7 NULL;
8 END Compare;
9
10 END p_test;
11 /

Warning: Package body created with compilation errors

SQL> show errors;
Errors for PACKAGE BODY SYSADM.P_TEST:

LINE/COL ERROR
-------- -------------------------------------------------------------------
3/30 PLS-00225: subprogram or cursor 'COMPARE' reference is out of scope
3/3 PL/SQL: Item ignored

SQL>



Example3 below CMP.COMPARE and P_TEST.COMPARE2 exists.(i changed the name of COMPARE to COMPARE2 in P_TEST) In P_TEST, there is a reference for COMPARE package. It Works fine.

SQL> CREATE OR REPLACE PACKAGE COMPARE IS
2 TYPE t_HashTable IS TABLE OF VARCHAR2(512) INDEX BY VARCHAR2(32);
3
4 TYPE t_SQL IS RECORD(
5 SQL_STATEMENT VARCHAR2(4000),
6 BIND_LIST t_HashTable
7 );
8
9 END COMPARE;
10 /

Package created

SQL> CREATE OR REPLACE PACKAGE BODY COMPARE IS
2
3 PROCEDURE Compare IS
4 BEGIN
5 NULL;
6 END Compare;
7
8 END COMPARE;
9 /

Package body created

SQL> CREATE OR REPLACE PACKAGE p_test IS END p_test;
2 /

Package created

SQL> CREATE OR REPLACE PACKAGE BODY p_test IS
2
3 TYPE t_SQLList IS TABLE OF COMPARE.t_SQL INDEX BY VARCHAR2(16);
4
5 PROCEDURE Compare2 IS
6 BEGIN
7 NULL;
8 END Compare2;
9
10 END p_test;
11 /

Package body created

SQ

Solution for ORA-01092, ORA-00704, ORA-39700

A few days ago i have created an Oracle database manually(once i have crated an automated DB creation tamplate, it has a bug L). Somehow, some errors occured during creation. Then, when i started open the db, i got some errors. I explained below what the work around is.

orcl oracle@srv:/data06/app/oracle/admin/orcl/scripts> sqlplus / as sysdba

SQL*Plus: Release 10.2.0.2.0 - Production on Tue Aug 5 23:50:26 2008

Copyright (c) 1982, 2005, Oracle. All Rights Reserved.

Connected to an idle instance.

SQL> startup
ORACLE instance started.

Total System Global Area 104857600 bytes
Fixed Size 2046864 bytes
Variable Size 75498608 bytes
Database Buffers 20971520 bytes
Redo Buffers 6340608 bytes
Database mounted.
ORA-01092: ORACLE instance terminated. Disconnection forced


SQL> quit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options


Check the alert log and trace

orcl oracle@srv:/data06/app/oracle/admin/orcl/bdump> more alert_orcl.log
......
.....
Errors in file /data06/app/oracle/admin/orcl/udump/orcl_ora_151225.trc:
ORA-00704: bootstrap process failure
ORA-39700: database must be opened with UPGRADE option
Tue Aug 5 23:45:33 2008
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 151225
ORA-1092 signalled during: alter database open...



Then solution

orcl oracle@srv:/data06/app/oracle/admin/orcl/scripts> sqlplus / as sysdba

SQL*Plus: Release 10.2.0.2.0 - Production on Tue Aug 5 23:51:59 2008

Copyright (c) 1982, 2005, Oracle. All Rights Reserved.

Connected to an idle instance.

SQL> startup upgrade
ORACLE instance started.

Total System Global Area 104857600 bytes
Fixed Size 2046864 bytes
Variable Size 75498608 bytes
Database Buffers 20971520 bytes
Redo Buffers 6340608 bytes
Database mounted.
Database opened.
SQL> @/data06/app/oracle/product/10.2/rdbms/admin/catalog.sql;
.....
SQL> @/data06/app/oracle/product/10.2/rdbms/admin/catproc.sql;

19 January 2008

Data Minig :: Classification and Clustering :: Naive Bayes and Decision Tree

1.     Giriş

Bu doküman, bir veri seti üzerinde, classification ve clustering Data Mining algoritmalarını incelemektedir. Naive Bayes ve Decision Tree algoritmalarının tamamen PL/SQL ve Oracle kullanarak nasıl kullanılabilineceğinin bir örneğini içermektedir. Sonuçta bir analiz yapılarak çalışma tamamlanacaktır. 

2.     Veri Seti

Kullanılan veri seti. “Contraceptive Method Choice” olarak belirtilen 1987 yılında “National Indonesia Contraceptive Prevalence Survey” araştırmasında kullanılan verilerin bir alt kümesidir. Bu veri seti ile daha önce “A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-three Old and New Classification Algorithms"(Lim, T.-S., Loh, W.-Y. & Shih, Y.-S. (1999)) ile kullanılmıştır.

Araştırma esnasında gebe olan,olmayan veya gebelik durumundan habersiz bayanların bilgileri alınmıştır. Amaçlanan, bayanların sosyoekonomik ve demografik durumlarına göre hangi yöntemi kullanacakları kestirilmeye çalışılmıştır.

Ver, setinde toplam 1473 örnek bulunmaktadır. Toplam özellik sayısı biri sınıf değeri olmak üzere 10’dur. Bunlar:


Özellik Adı
Özellik Tipi
Muhtemel değerler
Kadının yaşı
Sayısal

Kadının eğitim durumu
Kategorik
1(Düşük),2,3,4(Yüksek)
Erkeğin eğitim durumu
Kategorik
1(Düşük),2,3,4(Yüksek)
Çocuk Sayısı
Sayısal

Kadının dini inancı
Kategorik
0(Gayri müslim),1(Müslüman)
Kadının çalışma durumu
Kategorik
0(Evet),1(Hayır)
Erkeğin İşi
Kategorik
1,2,3,4
Hayat standardı
Kategorik
1(Düşük),2,3,4(Yüksek)
Medya korunmasızlığı
Kategorik
0(İyi),1(İyi değil)
Kullanılan yöntem
Sınıf
1,2,3


Herhangi bir özellik için eksik değer bulunmamaktadır.

Veri setinden bir kaç örnek aşağıda belirtilmiştir:

24,2,3,3,1,1,2,3,0,1
45,1,3,10,1,1,3,4,0,1
43,2,3,7,1,1,3,4,0,1

3.     Ön İşleme

Veri seti, her bir örneği bir satırda olacak şekilde tanımlanmıştır. Özellik değerleri arasında “,” ayraç olarak kullanılmıştır.Öncelikle verinin daha kolay işlenebilmesi için bir veritabanı içine aktarılması gereklidir. Kullanılacak  veritabanı olan Oracle’nin araçlarından biri olan SQL*Loader, metin tabanlı verilerin kolaylıkla Oracle içine alınmasını sağlamaktadır.


3.1. Tablo Oluşturma:

Bu iş için öncelikle bir tablo oluşturulmalıdır. Gerekli kısıt ve indexler verilmelidir:

drop table cmc;
create table cmc(
  wife_age number,
  wife_edu varchar2(1),
  husband_edu varchar2(1),
  children number,
  wife_religion varchar2(1),
  is_wife_working varchar2(1),
  husband_occupation varchar2(1),
  living_standart varchar2(1),
  media_exposure varchar2(1),
  contraceptive_method varchar2(1)
);
alter table cmc add constraint ck_wife_edu check ( wife_edu in ('1','2','3','4') );
alter table cmc add constraint ck_husband_edu check (husband_edu in ('1','2','3','4'));
alter table cmc add constraint ck_children check (children > -1 );
alter table cmc add constraint ck_Wife_religion check (Wife_religion  in ('1','0') );
alter table cmc add constraint ck_is_wife_working check (is_wife_working  in ('1','0') );
alter table cmc add constraint ck_husband_occupation check (husband_occupation in ('1','2','3','4'));
alter table cmc add constraint ck_living_standart check (living_standart in ('1','2','3','4'));
alter table cmc add constraint ck_media_exposure check (media_exposure  in ('1','0') );
alter table cmc add constraint ck_contraceptive_method check (contraceptive_method in ('1','2','3'));
create index i_contraceptive_method on cmc(contraceptive_method);


3.2. Veri Yükleme

SQL*Loader a gerekli control dosyası oluşturulmalıdır. Bu dosya içinde veriyi tanıtan bilgiler bulunmaktadır.

LOAD DATA

--Input file name
INFILE 'cmc.data'

--bad records put into
BADFILE  'cmc_bad.dat'

--discarded records put into
DISCARDFILE 'cmc_discarded.dat'

--truncate existing records
TRUNCATE

--add data to table
INTO TABLE cmc

--every column is distinguished by "," caharacter
FIELDS TERMINATED BY ","

--column names
(
  wife_age ,
  wife_edu ,
  husband_edu,
  children ,
  wife_religion ,
  is_wife_working,
  husband_occupation,
  living_standart,
  media_exposure,
  contraceptive_method
)

Hazırlanan bu dosya cmc.ctl adı ile verinin bulunduğu aynı dizine alınmalı ve aşağıdaki bat dosyası komut satırından çağırılmalıdır:

sqlldr userid=hr/hr control=cmc.ctl log=cmc.log

Bu işlem sonucunda veri, Oracle içine yüklenecektir.

Aşağıdaki komut ile veritabanı kontrol edilebilir:

SELECT * FROM cmc;

Veri setinin eğitim ve test amaçlı iki kümeye ayrılması gerelmektedir. Bu bölümleme 2’ye 1 olarak ayrılacaktır.Yani toplam örneklerin %66’sı eğitim amaçlı, %33’ü test amaçlı kullanılacaktır. Bu bölümleme için, aşağıdaki SQL komutlarından yararlanılır:

sqlldr userid=hr/hr control=cmc.ctl log=cmc.log

Bu işlem sonucunda veri, Oracle içine yüklenecektir.

Veri setinin %66’sı eğitim, kalan %33’ü ise test amaçlı kullanılacaktır. Bu işlem için aşağıdaki komutlar çalıştırılmalıdır. Bunun sonucunda cmc_train ve cmc_test adında iki tane tablo oluşacaktır.

create table cmc_train as
 SELECT *
   FROM (SELECT * FROM cmc ORDER BY ROWID ASC)
  WHERE rownum < (SELECT COUNT(*) FROM cmc) * (2 / 3);


create table  cmc_test as
SELECT *
  FROM cmc
 WHERE ROWID NOT IN
       (SELECT *
          FROM (SELECT ROWID FROM cmc ORDER BY ROWID ASC)
         WHERE rownum < (SELECT COUNT(*) FROM cmc) * (2 / 3));

4.     Decision Tree


Karar ağaçları, veri madenciliği uygulamalarında sıkça kullanılan bir yöntemdir. Bu yöntem ile en temelde bir ağaç üretilir. Bu ağaç üzerinde ilerleyerek sınıf hakkında bilgi sahibi olunabilir. Karar ağaçları aynı zamanda veri kümeleri üzerinde kestirim (prediction) yapmaya da imkân verir.
Öncelikle eğitim setimizde bulunan veriler ile bir ağaç oluşturulacaktır. Bunun için:

DECLARE
  w VARCHAR2(1024);
  t VARCHAR2(255);
  c VARCHAR2(255);
  d BOOLEAN;
  l NUMBER;
  a VARCHAR2(1024);
BEGIN
  w := NULL;
  t := 'cmc_train';
  c := 'contraceptive_method';
  d := false;
  l := 0;
  a := ',';

  dm.GenerateDecisionTree(pis_TableName        => t,
                          pis_ClassFieldName   => c,
                          pis_WhereStatement   => w,
                          pin_Level            => l,
                          pis_PassedAttributes => a,
                          pib_IsDebugMode      => d);
END;

Bunun sonucunda elimize içinde kurallar olan bir ağaç çıkacaktır. Bu kuralların bir kısmı:

when
   WIFE_AGE>34 AND CHILDREN>0 AND WIFE_EDU='1' AND HUSBAND_OCCUPATION='4'
then '3'
when
   WIFE_AGE>34 AND CHILDREN>0 AND WIFE_EDU='2' AND HUSBAND_OCCUPATION='4'
then '3'

Bu kuralları test kümemizde işletmeden önce test tablomuza tahmin edeceğimiz sınıf değerini tutması için bir kolon eklenecektir.

alter table cmc_test add expected_dt varchar2(1);

Daha sonra test kümesinde kurallar işletilecektir:

UPDATE cmc_test c
   SET c.expected_dt = (  
      SELECT CASE
             WHEN WIFE_AGE > 34 AND CHILDREN > 0 AND WIFE_EDU = '1' AND
                    HUSBAND_OCCUPATION = '1' AND HUSBAND_EDU = '2' THEN
                '1'
             WHEN ....
                 ......
               ELSE
                '-'
             END
        FROM cmc_test c2
       WHERE c.ROWID = c2.ROWID)

Bulunan sınıf değerleri karşılaştırılınca:


Doğru olarak sınıflanan örnek sayısı: 113
SELECT count(*) FROM cmc_test WHERE expected_dt = contraceptive_method;

Yanlış olarak sınıflanan örnek sayısı: 99
SELECT count(*) FROM cmc_test WHERE expected_dt <> contraceptive_method and expected_dt <> '-';

Sınıflanamayan örnek sayısı:280
SELECT count(*) FROM cmc_test WHERE expected_dt = '-';

Toplam örnek sayısı : 492

Burada bulunan sorun, eğitim setinin iyi seçilemediği yaklaşık veri setinin yaıısından fazlasının sınıflamadığıdır. Bu da başarıyı etkilemektedir. Sınıflanabilen örneklerin %53’ü doğru sınıflanmıştır.

Decision tree ile yapılan işlem, WEKA üzerinde test edilememiştir. ID3’e göre sadece kategorik verlerin, set içinde bulunması zorunluluğundan test etme imkanı olmamıştır.

Veri setinin tamamının eğitim, tamamının da test olarak kullnıldığı durumda ise %100’lük bir başarı sağlanmıştır.


5.     Naive Bayes


Bu algoritma bir takım istatisel bilgilerden yararlanarak, gelen test örneğinin, eğitim sonuncunda elde edilen ön bilgi ile en muhtemel sınıfa atanması işlemidir. Bu işlem için yine Decision tree için kullanılan eğitim ve test örneği kullnılacaktır.

Bu algoritmayı eğitim setinde işletmeden önce test tablomuza tahmin edeceğimiz sınıf değerini tutması için bir kolon eklenecektir.

alter table cmc_test add expected_nb varchar2(1);

Ardından naive bayes algoritması işletilecektir:

BEGIN
  DM.NaiveBayes3('cmc_train',
                 'cmc_test',
                 'contraceptive_method',
                 'EXPECTED_NB',
                 'EXPECTED_DT');
END;

Sonuçlar ise :

Doğru olarak sınıflanan örnek sayısı: 195
SELECT count(*) FROM cmc_test WHERE expected_nb = contraceptive_method;

Yanlış olarak sınıflanan örnek sayısı: 297
SELECT count(*) FROM cmc_test WHERE expected_nb <> contraceptive_method;

Toplam örnek sayısı : 492

Burada elde edilen başarı %40 seviyesindedir. Aynı işlem WEKA içinde bulunan SimpleBayes ile yapıldığında başarının biraz daha fazla olduğu görülmüştür.


6.     Sonuç


Eğitici yöntemler ile yapılan sınıflandırmada en önemli olan nokta eğitim setinin başarılı bir seçilde seçilmesidir. Aksi takdirde test esnasında başarıyı yakalamak güçleşir. Eğitim kümesini seçebilmek için gürültülü verilerin çıkarılması, sayısal verilerin kategorikleştirilmesi ve daha büyük eğitim setleri ile çalışılması başarıyı olumlu yönde etkileyecektir.

Kullanılan CMC veri setinde, örnekler birbirine çok yakın olduğu için başarının yakalanması az olmuştur. Bunun gibi birbirine yakın veri kümeleri ile çalışırken eğitim kümesinin fazla olması, verinin daha iyi temsil edilmesini sağlayacağı için daha iyi sonuçlar çıkaracaktır.

7.     Ekler