Park is the new hold
I wasn't there when they specified WebRTC, but the main goal must have been to put a layer over the audio device.
Because WebRTC has problems handling multiple calls.
The problem is that WebRTC has a great way of finding the fastest path between two devices sending media to each other; but holding that connection and setting up another one was not a priority. This results in two problems: (1) Holding calls the way we used to do this in SIP does not work and (2) conferencing calls on the device will also be a headache. But there are ways to deal with it.
Holding a call can be done by disconnecting the media stream, but keeping the call itself up on the signalling side. The main disadvantage is that it takes a few ms to re-establish the stream after resuming the call. All those ICE packets need to go back and forth, and at the end of the day we need to re-establish the DTLS session as well. If both sides maintain the TLS session token, it speeds things up, and for connections that really take single digit round trip times such a resume should be doable in less than 100 ms.
Loosing music on hold is not a big problem to me. Music is generated on the server side. The idea that the endpoint generates the music on hold was a design mistake in SIP anyway and this is a good opportunity to clean this up once for all.
The delay is not what we knew from SIP or even ISDN times, but it is something we can live with. The signalling side requires some sugar coating of the underlying hold logic, but it is also doable.
Conferencing is the other problem.
It would have been nice if we (with our PBX hats on) could put he burden of mixing conference streams into the client. In the SIP world, there was the possibility to have 3-party conferences hosted on the phone itself, that option is not available any more. However it is understandable that the caller cannot make too many assumptions on the capabilities of the client. For example imagine calling into a call center that wants to bring in someone into the call and the callers smart phone, possibly connected through 3G or 4G has the job to mix the conference is indeed unrealistic. Even for the 3-party conference it is better to properly address the problem and have the conference hosted on the PBX itself.
Video conference mixing remains a headache. Mixing multiple pictures into a stream remains a complex topic on the server. Here, simple solutions like speaker detection or management can help to get video on the screen of conference participants without killing the server CPU or the users bandwidth.
Anyway, the next build will include our new user interface where you can try the hold feature out. Conferencing is not in yet, this will take some additional time.