[getdns-api] Segmentation fault under load

Robert Groenenberg robert.groenenberg at broadforward.com
Fri Sep 9 11:08:28 CEST 2016


FYI:

Queries are now posted in a queue and trigger an event by means 
event_active(), an event callback function then picks up queued queries 
and fires them off. As the event callback is run by the same thread as 
the callbacks, all getdns context access is now from the same thread.

No more concurrent access to the rbtree and no SegV :-)

Perhaps a note in the part describing how to use getdns with libevent 
would be useful.

Kind regards,
Robert


On 09/08/2016 04:38 PM, Willem Toorop wrote:
> Op 08-09-16 om 15:48 schreef Robert Groenenberg:
>> Hi Willem,
>>
>> Thanks for the fast response.
>>
>> What is then the whole point of having an asynchronous API if the same
>> thread that initiates the query also has to run the event loop to handle
>> the responses?
>> I.e. this:
>>
>>>      else if ((r = *getdns_address*( context, query_name, extensions
>>>                                  , userarg, &transaction_id, callback)))
>>>          fprintf(stderr, "Error scheduling asynchronous request");
>>>      else {
>>>          printf("Request with transaction ID %"PRIu64" scheduled.\n",
>>> transaction_id);
>>>          if (*event_base_dispatch*(event_base) < 0)
>>>              fprintf(stderr, "Error dispatching events\n");
>>>      }
>> is pretty much the same as a synchronous API.
>>
>> Oh, wait, the solution is probably to have another callback of the same
>> event base initiate the query and have the other threads trigger that
>> callback via an event.
>> I'll explore that path.
> That's the idea yes... You could schedule a timeout event that walks
> through a list of requests to schedule?  And then experiment with what
> rate still gives good performance.
>
> Or reschedule from the answer processing.  For example to schedule in
> such a way that you have only a certain amount of outstanding requests.
>
> Cheers,
> -- Willem
>
>
>> Cheers,
>> Robert
>>
>>
>> On 09/08/2016 03:39 PM, Willem Toorop wrote:
>>> Op 08-09-16 om 15:13 schreef Robert Groenenberg:
>>>> Hi,
>>>>
>>>> Running an application that sends out ENUM queries using getdns
>>>> (v1.0.0b2) and libevent2 on CentOS 6, runs fine for a low amount of
>>>> requests.
>>>> However, when sending ~50 requests per second, a segmenation fault
>>>> occurs in _getdns_rbtree_insert() (some runs the Segv occurs in the
>>>> compare function), after a few minutes.
>>>>
>>>>> Program terminated with signal 11, Segmentation fault.
>>>>> #0  _getdns_rbtree_insert (rbtree=0x104a518, data=0x7f80f8004cc0) at
>>>>> util/rbtree.c:240
>>>>> 240            if ((r = rbtree->cmp(data->key, node->key)) == 0) {
>>>>> #0  _getdns_rbtree_insert (rbtree=0x104a518, data=0x7f80f8004cc0) at
>>>>> util/rbtree.c:240
>>>>> #1  0x00007f80cc1dc0ef in _getdns_context_track_outbound_request
>>>>> (dnsreq=0x7f80f8004cc0)
>>>>>      at ./context.c:3080
>>>>> #2  0x00007f80cc1c9fed in getdns_general_ns (context=0x10493e0,
>>>>> loop=0x1024940,
>>>>>      name=<value optimized out>, request_type=35, extensions=<value
>>>>> optimized out>,
>>>>>      userarg=0x7f80f8006b50, return_netreq_p=0x7f809c5cbb88,
>>>>>      callbackfn=0x7f80cc44f7d0 <enum_lclient_callback>, internal_cb=0,
>>>>> usenamespaces=0)
>>>>>      at ./general.c:452
>>>>> #3  0x00007f80cc1ca3f1 in _getdns_general_loop (context=<value
>>>>> optimized out>,
>>>>>      loop=<value optimized out>, name=<value optimized out>,
>>>>> request_type=<value optimized out>,
>>>>>      extensions=<value optimized out>, userarg=<value optimized out>,
>>>>> netreq_p=0x7f809c5cbb88,
>>>>>      callback=0x7f80cc44f7d0 <enum_lclient_callback>, internal_cb=0) at
>>>>> ./general.c:517
>>>>> #4  0x00007f80cc1ca454 in getdns_general (context=<value optimized out>,
>>>>>      name=<value optimized out>, request_type=<value optimized out>,
>>>>>      extensions=<value optimized out>, userarg=<value optimized out>,
>>>>>      transaction_id=0x7f80f8006b78, callbackfn=0x7f80cc44f7d0
>>>>> <enum_lclient_callback>)
>>>>>      at ./general.c:674
>>>> I suspect this to be a threading issue: the rbtree being accessed for
>>>> both insert and delete from different threads. The call to
>>>> /getdns_general//()/ is protected by a mutex in my application, so only
>>>> one thread issues a query at a time. However, the event base runs in its
>>>> own thread, as usual with libevent, so the problem probably lies in
>>>> entries being deleted from the rbtree when the response is handled.
>>>>
>>>> Is it supposed to be possible with getdns to run the event base in its
>>>> own thread?
>>> No getdns does not anticipate this modus operandi (as you found out
>>> yourself already).  Running and scheduling should be done from the same
>>> thread.
>>>
>>>
>>> -- Willem
>>>
>>>> Thanks,
>>>> Robert
>>>>
>>>>



More information about the spec mailing list