sql >> データベース >  >> RDS >> Oracle

時間とステータスの列からステータス値ごとの分にトランザクションデータを正規化します

    この種のクエリの1つの解決策には、カテゴリの生成とそれに続く生成されたカテゴリへの集約という2つの要素が含まれます。

    提供したデータの場合、この種のソリューションの最初のステップは、データを時間ごとにバケット化することです(提供したデータには、02:00時間または04:00時間にイベントがないため、これらの時間を表示します)最終結果では、それらを生成できます。 代わりは)。

    2つ目は、pivotを介して1時間ごとのバケットに集約することです。 、コメントでホルヘ・カンポスが述べたように。

    以下に例を示します。

    最初にテストテーブルを作成します:

    CREATE TABLE INSERT_TIME_STATUS(
      INSERT_TIME TIMESTAMP,
      STATUS VARCHAR2(128)
    );
    

    そして、テストデータを追加します:

    INSERT INTO INSERT_TIME_STATUS VALUES (TIMESTAMP '2017-01-01 00:00:00', 'AVAILABLE');
    INSERT INTO INSERT_TIME_STATUS VALUES (TIMESTAMP '2017-01-01 00:15:00', 'BUSY');
    INSERT INTO INSERT_TIME_STATUS VALUES (TIMESTAMP '2017-01-01 00:30:00', 'NOT AVAILABLE');
    INSERT INTO INSERT_TIME_STATUS VALUES (TIMESTAMP '2017-01-01 01:30:00', 'AVAILABLE');
    INSERT INTO INSERT_TIME_STATUS VALUES (TIMESTAMP '2017-01-01 03:10:00', 'BUSY');
    INSERT INTO INSERT_TIME_STATUS VALUES (TIMESTAMP '2017-01-01 05:00:00', 'NOT AVAILABLE');
    

    次に、クエリを作成します。これは、サブクエリファクタリングを使用して、このプロセスの2段階の性質を概説します。

    CALENDAR ここでのサブファクターは、その時間にレコードが発生したかどうかに関係なく、その日の各時間を生成します。

    HOUR_CALENDAR サブファクターは、提供された各ステータスレコードを特定の時間に割り当て、さらに1時間にまたがるステータスを細かく切り刻むため、すべてのレコードが1時間以内に収まります。

    DURATION_IN_STATUS サブファクターは、各ステータスが1時間ごとにアクティブだった分数をカウントします。

    最後のクエリはPIVOT 集計する(SUM )各STATUSの時間 1時間ごとにアクティブでした。

    WITH HOUR_OF_DAY AS (SELECT LEVEL - 1 AS THE_HOUR
                         FROM DUAL
                         CONNECT BY LEVEL < 25),
        CALENDAR AS (SELECT DAY_START
                     FROM (
                       SELECT (TIMESTAMP '2017-01-01 00:00:00' + NUMTODSINTERVAL(DATE_INCREMENT.OFFSET, 'DAY')) AS DAY_START
                       FROM (SELECT LEVEL - 1 AS OFFSET
                             FROM DUAL
                             CONNECT BY LEVEL < 9999) DATE_INCREMENT)
                     WHERE DAY_START BETWEEN (SELECT MIN(TRUNC(INSERT_TIME_STATUS.INSERT_TIME))
                                              FROM INSERT_TIME_STATUS)
                     AND (SELECT MAX(TRUNC(INSERT_TIME_STATUS.INSERT_TIME))
                          FROM INSERT_TIME_STATUS)),
        HOUR_CALENDAR AS (
         SELECT
           TO_CHAR(CALENDAR.DAY_START, 'MM/DD/YYYY')                                               AS THE_DAY,
           HOUR_OF_DAY.THE_HOUR,
           CALENDAR.DAY_START + NUMTODSINTERVAL(HOUR_OF_DAY.THE_HOUR, 'HOUR')                      AS HOUR_START,
           (SELECT MAX(INSERT_TIME_STATUS.STATUS)
           KEEP (DENSE_RANK LAST
             ORDER BY INSERT_TIME_STATUS.INSERT_TIME ASC)
            FROM INSERT_TIME_STATUS
            WHERE INSERT_TIME_STATUS.INSERT_TIME <= DAY_START + NUMTODSINTERVAL(THE_HOUR, 'HOUR')) AS HOUR_START_STATUS
         FROM CALENDAR
           CROSS JOIN HOUR_OF_DAY),
        ALL_HOUR_STATUS AS (
        SELECT
          HOUR_CALENDAR.THE_DAY,
          HOUR_CALENDAR.THE_HOUR,
          HOUR_CALENDAR.HOUR_START        AS THE_TIME,
          HOUR_CALENDAR.HOUR_START_STATUS AS THE_STATUS
        FROM HOUR_CALENDAR
        UNION ALL
        SELECT
          HOUR_CALENDAR.THE_DAY,
          HOUR_CALENDAR.THE_HOUR,
          INSERT_TIME_STATUS.INSERT_TIME AS THE_TIME,
          INSERT_TIME_STATUS.STATUS      AS THE_STATUS
        FROM HOUR_CALENDAR
          INNER JOIN INSERT_TIME_STATUS
            ON HOUR_CALENDAR.HOUR_START < INSERT_TIME_STATUS.INSERT_TIME
               AND HOUR_CALENDAR.THE_HOUR = EXTRACT(HOUR FROM INSERT_TIME_STATUS.INSERT_TIME)),
        DURATION_IN_STATUS AS (
         SELECT
           ALL_HOUR_STATUS.THE_DAY,
           ALL_HOUR_STATUS.THE_HOUR,
           ALL_HOUR_STATUS.THE_STATUS,
           (EXTRACT(HOUR FROM
                    (COALESCE(LEAD(THE_TIME)
                              OVER (
                                PARTITION BY NULL
                                ORDER BY THE_TIME ASC ), TO_TIMESTAMP(THE_DAY, 'MM/DD/YYYY') + NUMTODSINTERVAL(THE_HOUR + 1, 'HOUR')) - THE_TIME)) * 60)
           +
           EXTRACT(MINUTE FROM
                   (COALESCE(LEAD(THE_TIME)
                             OVER (
                               PARTITION BY NULL
                               ORDER BY THE_TIME ASC ), TO_TIMESTAMP(THE_DAY, 'MM/DD/YYYY') + NUMTODSINTERVAL(THE_HOUR + 1, 'HOUR')) - THE_TIME))
             AS DURATION_IN_STATUS
         FROM ALL_HOUR_STATUS)
    SELECT
      THE_DAY,
      THE_HOUR,
      COALESCE(AVAILABLE, 0)     AS AVAILABLE,
      COALESCE(NOT_AVAILABLE, 0) AS NOT_AVAILABLE,
      COALESCE(BUSY, 0)          AS BUSY
    FROM DURATION_IN_STATUS
    PIVOT (SUM(DURATION_IN_STATUS)
      FOR THE_STATUS
      IN ('AVAILABLE' AS AVAILABLE, 'NOT AVAILABLE' AS NOT_AVAILABLE, 'BUSY' AS BUSY)
    )
    ORDER BY THE_DAY ASC, THE_HOUR ASC;
    

    結果:

    THE_DAY     THE_HOUR  AVAILABLE  NOT_AVAILABLE  BUSY  
    01/01/2017  0         15         30             15    
    01/01/2017  1         30         30             0     
    01/01/2017  2         60         0              0     
    01/01/2017  3         10         0              50    
    01/01/2017  4         0          0              60    
    01/01/2017  5         0          60             0     
    01/01/2017  6         0          60             0     
    01/01/2017  7         0          60             0     
    01/01/2017  8         0          60             0     
    01/01/2017  9         0          60             0     
    01/01/2017  10        0          60             0     
    01/01/2017  11        0          60             0     
    01/01/2017  12        0          60             0     
    01/01/2017  13        0          60             0     
    01/01/2017  14        0          60             0     
    01/01/2017  15        0          60             0     
    01/01/2017  16        0          60             0     
    01/01/2017  17        0          60             0     
    01/01/2017  18        0          60             0     
    01/01/2017  19        0          60             0     
    01/01/2017  20        0          60             0     
    01/01/2017  21        0          60             0     
    01/01/2017  22        0          60             0     
    01/01/2017  23        0          60             0     
    
    
    24 rows selected. 
    

    このクエリ例では、1日全体のレコードが生成されます。したがって、NOT AVAILABLEの最後のステータス 持ち越します。最後に割り当てられたステータスのときに停止したい場合は、この動作を必要に応じて調整できます。

    編集、更新に応じて、channel_idごとにこれらの時間を評価します およびuser_id 、ここに別の例があります:

    最初にテストテーブルを作成します:

    CREATE TABLE INSERT_TIME_STATUS(
      USER_ID NUMBER,
      CHANNEL_ID NUMBER,
      INSERT_TIME TIMESTAMP,
      STATUS VARCHAR2(128)
    );
    

    そしてそれをロードします(ここでuser_id =1はチャンネル3と4にあり、user_id =2はチャンネル3のみにあります):

    INSERT INTO INSERT_TIME_STATUS VALUES (1111,3,TO_TIMESTAMP('1/1/2017 0:00','MM/DD/YYYY HH24:MI'),'AVAILABLE');
    INSERT INTO INSERT_TIME_STATUS VALUES (1111,3,TO_TIMESTAMP('1/1/2017 0:15','MM/DD/YYYY HH24:MI'),'BUSY');
    INSERT INTO INSERT_TIME_STATUS VALUES (1111,3,TO_TIMESTAMP('1/1/2017 0:30','MM/DD/YYYY HH24:MI'),'NOT AVAILABLE');
    INSERT INTO INSERT_TIME_STATUS VALUES (1111,3,TO_TIMESTAMP('1/1/2017 1:30','MM/DD/YYYY HH24:MI'),'AVAILABLE');
    INSERT INTO INSERT_TIME_STATUS VALUES (1111,3,TO_TIMESTAMP('1/1/2017 3:10','MM/DD/YYYY HH24:MI'),'BUSY');
    INSERT INTO INSERT_TIME_STATUS VALUES (1111,3,TO_TIMESTAMP('1/1/2017 5:00','MM/DD/YYYY HH24:MI'),'NOT AVAILABLE');
    INSERT INTO INSERT_TIME_STATUS VALUES (1111,4,TO_TIMESTAMP('1/1/2017 0:00','MM/DD/YYYY HH24:MI'),'AVAILABLE');
    INSERT INTO INSERT_TIME_STATUS VALUES (1111,4,TO_TIMESTAMP('1/1/2017 0:15','MM/DD/YYYY HH24:MI'),'BUSY');
    INSERT INTO INSERT_TIME_STATUS VALUES (1111,4,TO_TIMESTAMP('1/1/2017 0:30','MM/DD/YYYY HH24:MI'),'NOT AVAILABLE');
    INSERT INTO INSERT_TIME_STATUS VALUES (1111,4,TO_TIMESTAMP('1/1/2017 1:30','MM/DD/YYYY HH24:MI'),'AVAILABLE');
    INSERT INTO INSERT_TIME_STATUS VALUES (1111,4,TO_TIMESTAMP('1/1/2017 3:10','MM/DD/YYYY HH24:MI'),'BUSY');
    INSERT INTO INSERT_TIME_STATUS VALUES (1111,4,TO_TIMESTAMP('1/1/2017 5:00','MM/DD/YYYY HH24:MI'),'NOT AVAILABLE');
    INSERT INTO INSERT_TIME_STATUS VALUES (2222,3,TO_TIMESTAMP('1/1/2017 0:00','MM/DD/YYYY HH24:MI'),'AVAILABLE');
    INSERT INTO INSERT_TIME_STATUS VALUES (2222,3,TO_TIMESTAMP('1/1/2017 0:15','MM/DD/YYYY HH24:MI'),'BUSY');
    INSERT INTO INSERT_TIME_STATUS VALUES (2222,3,TO_TIMESTAMP('1/1/2017 0:30','MM/DD/YYYY HH24:MI'),'NOT AVAILABLE');
    INSERT INTO INSERT_TIME_STATUS VALUES (2222,3,TO_TIMESTAMP('1/1/2017 1:30','MM/DD/YYYY HH24:MI'),'AVAILABLE');
    INSERT INTO INSERT_TIME_STATUS VALUES (2222,3,TO_TIMESTAMP('1/1/2017 3:10','MM/DD/YYYY HH24:MI'),'BUSY');
    INSERT INTO INSERT_TIME_STATUS VALUES (2222,3,TO_TIMESTAMP('1/1/2017 5:00','MM/DD/YYYY HH24:MI'),'NOT AVAILABLE');
    INSERT INTO INSERT_TIME_STATUS VALUES (2222,3,TO_TIMESTAMP('1/1/2017 5:00','MM/DD/YYYY HH24:MI'),'NOT AVAILABLE');
    

    次に、クエリを更新して、user_idごとにデータを生成します。 per- channel_id 。この例では、各ユーザーが関与するすべてのチャネルについて、常にデータが含まれています。ユーザー1は、チャネル3の1日の時間ごとにカウントされます。 および4 一方、user-2は、チャネル3についてのみ1日の時間ごとにカウントされます(別のチャネルにレコードがある場合は、そのチャネルも含まれます)。

    WITH HOUR_OF_DAY AS (SELECT LEVEL - 1 AS THE_HOUR
                         FROM DUAL
                         CONNECT BY LEVEL < 25),
        CALENDAR AS (SELECT DAY_START
                     FROM (
                       SELECT ((SELECT MIN(TRUNC(INSERT_TIME_STATUS.INSERT_TIME))
                                FROM INSERT_TIME_STATUS) + NUMTODSINTERVAL(DATE_INCREMENT.OFFSET, 'DAY')) AS DAY_START
                       FROM (SELECT LEVEL - 1 AS OFFSET
                             FROM DUAL
                             CONNECT BY LEVEL < 9999) DATE_INCREMENT)
                     WHERE DAY_START BETWEEN (SELECT MIN(TRUNC(INSERT_TIME_STATUS.INSERT_TIME))
                                              FROM INSERT_TIME_STATUS)
                     AND (SELECT MAX(TRUNC(INSERT_TIME_STATUS.INSERT_TIME))
                          FROM INSERT_TIME_STATUS)),
        USER_CHANNEL_HOUR_CALENDAR AS (
         SELECT
           USER_ID,
           CHANNEL_ID,
           CALENDAR.DAY_START,
           TO_CHAR(CALENDAR.DAY_START, 'MM/DD/YYYY')                                               AS THE_DAY,
           HOUR_OF_DAY.THE_HOUR,
           CALENDAR.DAY_START + NUMTODSINTERVAL(HOUR_OF_DAY.THE_HOUR, 'HOUR')                      AS HOUR_START
         FROM CALENDAR
           CROSS JOIN HOUR_OF_DAY
           --
           CROSS JOIN (SELECT UNIQUE USER_ID, CHANNEL_ID FROM INSERT_TIME_STATUS)
      ),
        HOUR_CALENDAR AS (
         SELECT USER_ID,
           CHANNEL_ID,
           THE_DAY,
           THE_HOUR,
           DAY_START,
           HOUR_START,
           (SELECT MAX(INSERT_TIME_STATUS.STATUS)
           KEEP (DENSE_RANK LAST
             ORDER BY INSERT_TIME_STATUS.INSERT_TIME ASC)
            FROM INSERT_TIME_STATUS
            WHERE INSERT_TIME_STATUS.INSERT_TIME <= DAY_START + NUMTODSINTERVAL(THE_HOUR, 'HOUR')
                  AND INSERT_TIME_STATUS.USER_ID = USER_ID
                  AND INSERT_TIME_STATUS.CHANNEL_ID = CHANNEL_ID) AS HOUR_START_STATUS
         FROM USER_CHANNEL_HOUR_CALENDAR),
        ALL_HOUR_STATUS AS (
        SELECT
          HOUR_CALENDAR.USER_ID,
          HOUR_CALENDAR.CHANNEL_ID,
          HOUR_CALENDAR.THE_DAY,
          HOUR_CALENDAR.THE_HOUR,
          HOUR_CALENDAR.HOUR_START        AS THE_TIME,
          HOUR_CALENDAR.HOUR_START_STATUS AS THE_STATUS
        FROM HOUR_CALENDAR
        UNION ALL
        SELECT
          INSERT_TIME_STATUS.USER_ID,
          INSERT_TIME_STATUS.CHANNEL_ID,
          HOUR_CALENDAR.THE_DAY,
          HOUR_CALENDAR.THE_HOUR,
          INSERT_TIME_STATUS.INSERT_TIME AS THE_TIME,
          INSERT_TIME_STATUS.STATUS      AS THE_STATUS
        FROM HOUR_CALENDAR
          INNER JOIN INSERT_TIME_STATUS
            ON HOUR_CALENDAR.HOUR_START < INSERT_TIME_STATUS.INSERT_TIME
               AND HOUR_CALENDAR.THE_HOUR = EXTRACT(HOUR FROM INSERT_TIME_STATUS.INSERT_TIME)
               AND HOUR_CALENDAR.USER_ID = INSERT_TIME_STATUS.USER_ID
               AND HOUR_CALENDAR.CHANNEL_ID = INSERT_TIME_STATUS.CHANNEL_ID),
        DURATION_IN_STATUS AS (
         SELECT
           ALL_HOUR_STATUS.USER_ID,
           ALL_HOUR_STATUS.CHANNEL_ID,
           ALL_HOUR_STATUS.THE_DAY,
           ALL_HOUR_STATUS.THE_HOUR,
           ALL_HOUR_STATUS.THE_STATUS,
           (EXTRACT(HOUR FROM
                    (COALESCE(LEAD(THE_TIME)
                              OVER (
                                PARTITION BY USER_ID, CHANNEL_ID
                                ORDER BY THE_TIME ASC ), TO_TIMESTAMP(THE_DAY, 'MM/DD/YYYY') + NUMTODSINTERVAL(THE_HOUR + 1, 'HOUR')) - THE_TIME)) * 60)
           +
           EXTRACT(MINUTE FROM
                   (COALESCE(LEAD(THE_TIME)
                             OVER (
                               PARTITION BY USER_ID, CHANNEL_ID
                               ORDER BY THE_TIME ASC ), TO_TIMESTAMP(THE_DAY, 'MM/DD/YYYY') + NUMTODSINTERVAL(THE_HOUR + 1, 'HOUR')) - THE_TIME))
             AS DURATION_IN_STATUS
         FROM ALL_HOUR_STATUS)
    SELECT
      USER_ID,
      CHANNEL_ID,
      THE_DAY,
      THE_HOUR,
      COALESCE(AVAILABLE, 0)     AS AVAILABLE,
      COALESCE(NOT_AVAILABLE, 0) AS NOT_AVAILABLE,
      COALESCE(BUSY, 0)          AS BUSY
    FROM DURATION_IN_STATUS
    PIVOT (SUM(DURATION_IN_STATUS)
      FOR THE_STATUS
      IN ('AVAILABLE' AS AVAILABLE, 'NOT AVAILABLE' AS NOT_AVAILABLE, 'BUSY' AS BUSY)
    )
      -- You can additionally filter the result
      -- WHERE CHANNEL_ID IN (3,4)
      -- WHERE USER_ID = 12345
      -- WHERE THE_DAY > TO_CHAR(DATE '2017-01-01')
      -- etc.
    ORDER BY USER_ID ASC, CHANNEL_ID ASC, THE_DAY ASC, THE_HOUR ASC;
    

    次にテストします:

    USER_ID  CHANNEL_ID  THE_DAY     THE_HOUR  AVAILABLE  NOT_AVAILABLE  BUSY  
    1111     3           01/01/2017  0         15         30             15    
    1111     3           01/01/2017  1         30         30             0     
    1111     3           01/01/2017  2         60         0              0     
    1111     3           01/01/2017  3         10         0              50    
    1111     3           01/01/2017  4         0          0              60    
    1111     3           01/01/2017  5         0          60             0     
    1111     3           01/01/2017  6         0          60             0  
    ...
    1111     3           01/01/2017  23        0          60             0     
    1111     4           01/01/2017  0         15         30             15    
    1111     4           01/01/2017  1         30         30             0     
    1111     4           01/01/2017  2         60         0              0     
    1111     4           01/01/2017  3         10         0              50    
    1111     4           01/01/2017  4         0          0              60    
    1111     4           01/01/2017  5         0          60             0     
    1111     4           01/01/2017  6         0          60             0
    ...
    1111     4           01/01/2017  23        0          60             0     
    2222     3           01/01/2017  0         15         30             15    
    2222     3           01/01/2017  1         30         30             0     
    2222     3           01/01/2017  2         60         0              0     
    2222     3           01/01/2017  3         10         0              50    
    2222     3           01/01/2017  4         0          0              60    
    2222     3           01/01/2017  5         0          60             0     
    2222     3           01/01/2017  6         0          60             0 
    



    1. 日付と時刻のデータのバケット化

    2. postgresql:INSERT INTO ...(SELECT * ...)

    3. Oracleは制約を見つけます

    4. ScaleGridは、スポットライトエクイティパートナーから成長エクイティラウンドを引き上げ、拡張を加速し、製品ロードマップにさらに投資します