Problem

Recently, I encountered a problem when writing a SQL, and the execution time of SQL was too long:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
select
'A' as type,
count(adl.is_attend) as count,
round(sum(case when adl.is_attend = 1 then 1 else 0 end)/ count(adl.is_attend),2) * 100 as rate
from
attendance adl
inner join (select deptId from department start with deptId = '...'
connect by up_daptId = prior deptId) as nbd on adl.deptId = nbd.deptId
where
to_date(adl.date,'yyyy-MM-dd') >= to_date( #{begin_date}
,'yyyy-MM-dd')
and to_date(adl.date,'yyyy-MM-dd') <= to_date( #{end_date}
,'yyyy-MM-dd')

union all

......

The above code is a SQL code used in combination with Union. The other parts of the code are very similar to this part of the code, except that the department id in the inner join statement and type in the select statement. The total running time of the code reaches about 110s, and the running time of a single SQL code is about 18s, which can be said runtime is too long.

In addition, #{begin_date} and #{end_date} are the parameters that will be passed in by the backend respectively, and are used for dynamic query.

Optimization

First analyze the most immediate question - why is the running time so long?

In fact, the problem here is not very complicated. There are more than 13 million pieces of data in the attendance table. The huge amount of data takes about 20 seconds for even the most basic and simple query.

In addition, the department table that needs to be associated is a recursive join, which will query all the small departments under the given deptId, and the query here is also time-consuming.

So, how to solve the problem?

Solution 1: First of all, the problem of data volume is unsolvable. It is impossible for us to say that a certain piece of data is directly discarded. Moreover, our where condition has already limited the time, so there is actually a certain limit on the data volume. Therefore, the first idea of optimization is on Join:

Recursive Join is a very time-consuming operation, so we can directly think of building an index on deptId-in fact, this is the correct idea, and the running time of indexing is directly optimized from 18S to 2S.

Solution 2: Here can also optimize the query condition of where:

The to_date function is used when the date is limited. In fact, when only comparing the year, month, and day, we can directly not use this function for string comparison. This is because Oracle SQL’s Data Priority will implicitly convert low-priority strings to date formats. Here, through implicit conversion, the running time can be continuously optimized, from 2S to 1S.

But note, this is risky - first, implicit data type conversion may have a negative impact on performance (of course here is a positive); also, implicit conversion may produce strange results. Depending on the value of the NLS_DATE_FORMAT parameter.

When converting a datetime value to text via an implicit conversion or an explicit conversion that does not specify a format model, the format model is defined by one of the globalization session parameters, namely NLS_DATE_FORMAT, NLS_TIMESTAMP_FORMAT, or NLS_TIMESTAMP_TZ_FORMAT.

For example, the default value of the NLS_DATE_FORMAT setting is 'DD-MON-RR' two-digit year, so the century time may be lost when the date is converted to a string, and replaced by 1900 or 2000 when converted back to a date based on the decade value; and assuming that the original time has some minutes and seconds, this data will also be lost.

In conclusion, different parameters may have different results.

So, considering that the optimization of Solution 2 is not necessary, I still keep the date function.