SqlAlchemyとcx_Oracleを使用してPandasDataFrameをOracleデータベースに書き込む場合は、to

Pandas + SQLAlchemyはデフォルトで、すべてのobjectを保存します（文字列） CLOBとしての列 Oracle DBでは、挿入が非常にになります。遅い。

ここにいくつかのテストがあります：

import pandas as pd
import cx_Oracle
from sqlalchemy import types, create_engine

#######################################################
### DB connection strings config
#######################################################
tns = """
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = my-db-scan)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = my_service_name)
    )
  )
"""

usr = "test"
pwd = "my_oracle_password"

engine = create_engine('oracle+cx_oracle://%s:%[email protected]%s' % (usr, pwd, tns))

# sample DF [shape: `(2000, 11)`]
# i took your 2 rows DF and replicated it: `df = pd.concat([df]* 10**3, ignore_index=True)`
df = pd.read_csv('/path/to/file.csv')

DF情報：

In [61]: df.shape
Out[61]: (2000, 11)

In [62]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 11 columns):
id               2000 non-null int64
name             2000 non-null object
premium          2000 non-null float64
created_date     2000 non-null datetime64[ns]
init_p           2000 non-null float64
term_number      2000 non-null int64
uprate           1000 non-null float64
value            2000 non-null int64
score            2000 non-null float64
group            2000 non-null int64
action_reason    2000 non-null object
dtypes: datetime64[ns](1), float64(4), int64(4), object(2)
memory usage: 172.0+ KB

OracleDBに保存するのにかかる時間を確認しましょう：

In [57]: df.shape
Out[57]: (2000, 11)

In [58]: %timeit -n 1 -r 1 df.to_sql('test_table', engine, index=False, if_exists='replace')
1 loop, best of 1: 16 s per loop

Oracle DBの場合（CLOBに注意してください）：

AAA> desc test.test_table
 Name                            Null?    Type
 ------------------------------- -------- ------------------
 ID                                       NUMBER(19)
 NAME                                     CLOB        #  !!!
 PREMIUM                                  FLOAT(126)
 CREATED_DATE                             DATE
 INIT_P                                   FLOAT(126)
 TERM_NUMBER                              NUMBER(19)
 UPRATE                                   FLOAT(126)
 VALUE                                    NUMBER(19)
 SCORE                                    FLOAT(126)
 group                                    NUMBER(19)
 ACTION_REASON                            CLOB        #  !!!

それでは、パンダにすべてのobjectを保存するように指示しましょう。 VARCHARデータ型としての列：

In [59]: dtyp = {c:types.VARCHAR(df[c].str.len().max())
    ...:         for c in df.columns[df.dtypes == 'object'].tolist()}
    ...:

In [60]: %timeit -n 1 -r 1 df.to_sql('test_table', engine, index=False, if_exists='replace', dtype=dtyp)
1 loop, best of 1: 335 ms per loop

今回は約でした。 48倍速い

Oracle DBにチェックインします：

 AAA> desc test.test_table
 Name                          Null?    Type
 ----------------------------- -------- ---------------------
 ID                                     NUMBER(19)
 NAME                                   VARCHAR2(13 CHAR)        #  !!!
 PREMIUM                                FLOAT(126)
 CREATED_DATE                           DATE
 INIT_P                                 FLOAT(126)
 TERM_NUMBER                            NUMBER(19)
 UPRATE                                 FLOAT(126)
 VALUE                                  NUMBER(19)
 SCORE                                  FLOAT(126)
 group                                  NUMBER(19)
 ACTION_REASON                          VARCHAR2(8 CHAR)        #  !!!

200.000行のDFでテストしてみましょう：

In [69]: df.shape
Out[69]: (200000, 11)

In [70]: %timeit -n 1 -r 1 df.to_sql('test_table', engine, index=False, if_exists='replace', dtype=dtyp, chunksize=10**4)
1 loop, best of 1: 4.68 s per loop

私のテスト（最速ではない）環境では、200K行のDFに最大5秒かかりました。

結論： dtypeを明示的に指定するには、次のトリックを使用します objectのすべてのDF列 DataFrameをOracleDBに保存するときのdtype。それ以外の場合は、CLOBデータ型として保存されるため、特別な処理が必要になり、非常に遅くなります

dtyp = {c:types.VARCHAR(df[c].str.len().max())
        for c in df.columns[df.dtypes == 'object'].tolist()}

df.to_sql(..., dtype=dtyp)

SqlAlchemyとcx_Oracleを使用してPandasDataFrameをOracleデータベースに書き込む場合は、to_sql（）を高速化します