mardi 18 juin 2013

About the importance of unit-test defintion

Last week, a friend and I submitted some piece of code to solve an on-line coding challenge. It was a 'french' challenge but I will try to explain the problem and present the data vector used to test submitted function. We will show that their data vector definition is incorrect and doesn't detect if a function can really solve to problem !

First, the problem:

As developer and team member of a High-Tech Company, your manager ask you to register and join the maximum of development and tech-event you can.
So, you list all the incoming event and some are in the same time.
But you still need to be in the "maximum" of event.

So nothing magic... you cannot be in 2 different place the same day.
For each event, you only know when it start and its duration (days).

Second, the test data they use:

They propose 3 different test to check the function output:



Simple

2013-01-12;5
2013-01-01;13
2013-01-16;20

function should return 3





Complex

2014-04-28;4
2014-05-01;7
2014-05-03;7
2014-05-09;5
2014-05-11;11
2014-05-16;3
2014-05-20;1

function should return 4




With a leap year

2012-02-20;10
2012-03-01;2
2012-03-12;2
2012-03-14;2
2012-03-15;2
2012-03-20;13

function should return 5


And now where the defined test failed ...

My co-worker and I had 2 different favorite language (Python for him and C++ for me) and 2 different solution.
In fact he use a real exhaustive exploration using the 'permutation' function to find the 'maximum number of event', but it lead to a high complexity if the number of event grows. On my side, I preferred a linear complexity and I used a common 'greedy' solver. But in fact case, it's well known that it can not find the 'optimal' solution.

My simple C++ implementation use an t_event class define by:

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
class event_t 
{
public:
 event_t(const std::string& _date)
 {
  memset(&_start_date, 0, sizeof(struct tm));

  sscanf(_date.c_str(), "%d-%d-%d;%d", &_start_date.tm_year, &_start_date.tm_mon, &_start_date.tm_mday, &_duration);
  _start_date.tm_year -= 1900;
  _start_date.tm_mon -= 1;

  _start_pt = mktime(&_start_date);
  _end_pt = _start_pt + _duration * 24 * 60 * 60;

  memset(&_end_date, 0, sizeof(struct tm));
  memcpy(&_end_date, gmtime(&_end_pt), sizeof(struct tm));
 }

 event_t(const event_t& e)
 {
  _duration = e._duration;
  _start_date = e._start_date;
  _start_pt = e._start_pt;
  _end_pt = e._end_pt;
  _end_date = e._end_date;
 }

 bool operator < (const event_t& r)
 {
  return this->_end_pt < r._end_pt;
 }

 friend std::ostream& operator<< (std::ostream& stream, const event_t& evt);

public:
  struct tm _start_date;
  unsigned int _duration;
  struct tm _end_date;

  time_t _start_pt;
  time_t  _end_pt;
} ;

And use the following function to build a planning:

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
void CreatePlanning( std::vector<event_t> &_list )
{
 std::ostream_iterator< event_t > dump_events( std::cout, "" );

 std::sort(_list.begin(), _list.end()); 
 std::cout << " Start \t End \t Duration \n", 
  std::copy(_list.begin(),_list.end(), dump_events);

 std::vector<event_t> _planning;
 _planning.push_back(_list[0]);
 unsigned int j = 0;
 for (unsigned int i = 1 ; i < _list.size() ; ++i)
 {
  if(_list[i]._start_pt >= _list[j]._end_pt)
  { 
   _planning.push_back(_list[i]);
   j = i;
  }
 }

 std::cout << "Nb event planned : " << _planning.size() << std::endl;
 std::cout << " Start \t End \t Duration \n", 
  std::copy(_planning.begin(),_planning.end(), dump_events);
}

That code is ok regarding the expected result of the unit-test, but as i said before the price for the lower complexity is that it can lead to a no optimal solution.
The heuristic I use can be summarize, start ASAP and restart ASAP again and again ... Simple and efficient, I'm a greedy person !!!