Hardware-accelerated Encoding [Streaming Video Technologies Panorama, part 1]

Maybe some of you remember the Tarari Encoder Accelerator for Windows Media which came on market in 2005 as a FPGA loaded PCI board. It was a 10K$ investment but it could seriously boost your encoder performances and it was a transparent solution for all encoders integrating Windows Media SDK. That was maybe the only real reliable option to do HD encoding decently at that time. More confidential were the Ambric cards for accelerating MainConcept H.264 and MPEG-2 SDK, which were found to be working with Inlet Armada transcoding farm.

Since these days, Tarari boards vanished, Windows Media encoding has been somehow outshined by H.264 and CPU performances have made great jumps, but the needs for hardware accelerated encoding solutions is still there, mainly because :
– H.264 encoding is also hungrily crunching CPU cycles
– screen types to feed have exploded with mobile, tablets, connected TVs and all other OTT devices
– adaptive streaming requires far more versions of the same file that previously mono-bitrate encodings
– available rackspace is not endless and it’s not convenient to manage hundreds of encoding nodes
– new formats like 3D and SVC are demanding strong encoding power
– you like to play with cool high-end encoders and you have strong convincing skills when it comes to make your boss buy expen$ive hardware

So let’s take a look at the different options available on the market now !

 

The technologies that enable hardware-accelerated encoding are of three types. All of them allow to execute code in an optimized way on the dedicated processor, thus allowing the CPU to take in charge all operations that cannot be accelerated, like encapsulation, muxing, audio encoding, DRMisation and so on.
– GPU : this technology uses parallell processing architecure. Although it’s a growing option, it must overcome the architecure limitations and ensure constant codec quality evolution to become THE most prevalent solution. Relying mainly on Nvidia CUDA architecture, main providers are MainConcept through its SDK and Elemental through its appliances. It’s also the most power-demanding architecture.
– DSP : mainly driven by Texas Instruments with its DaVinci platform, DSPs are now loosing focus in the industry.
– ASIC / FPGA : while being the most perfoming architecture, it’s also the most expensive.

Here’s a selection of ready-to-go products which provide hardware acceleration for encoding, as of July 2011. Please point out any other significant products that should be added to this list or complete the points which are unclear/unknown on the matrix !

GPU Acceleration Options

SUPPLIERElementalElementalMicrosoftSorenson
MODELLiveServerExpression Encoder ProSqueeze Server
ACCELERATION TYPEGPU (Nvidia)GPU (Nvidia)GPU (Nvidia)GPU (Nvidia) / QuickSync (Intel)
SOLUTION TYPEApplianceApplianceSoftware + cardSoftware + card
ENCODING (LIVE)Yes-Yes-
TRANSCODING (LIVE)Yes---
TRANSCODING (FILE)YesYesYesYes
OUTPUT : H.264 AVCYesYesYesYes
OUTPUT : H.264 SVC----
OUTPUT : 3D MVCYes---
OUTPUT : MPEG-TSYesYes-Yes
ABR SUPPORT : Smooth StreamingYesYesYesYes
ABR SUPPORT : Adobe HDS---Yes
ABR SUPPORT : Apple HLSYesYes-Yes
DRM SUPPORTYesYesYesYes
APIRESTREST.NETREST
CENTRAL MANAGEMENTYesPartial-Yes
PRODUCT PAGEhttp://bit.ly/kRJGH5http://bit.ly/mJq1Olhttp://bit.ly/kdf3sPhttp://bit.ly/m1L8TU
NOTES

Elemental Live & Server – the rising stars

Avail-TVN racks full of Elemental appliances

As many in the industry, my perception is that Elemental is Inlet’s (sorry, Cisco – now) rising challenger. The recent 1.3 release of their Live appliance shortens again the gap with the Inlet Spinnaker’s features. For on-demand, they seem to have in sight Sorenson and its hybrid server solution mixing dedicated servers and cloud. Elemental has been building its own H.264, VC-1 and MPEG-2 codecs to work with CUDA technology, so it gives them a big flexibility over standard codec integration and allows them to differenciate from the usual MainConcept rendering that we see everywhere. Using standard Nvidia boards gives also a good hope of smooth upgradability of the appliances.
They have two per-use dedicated products, Elemental Live for live streaming, and Elemental Server for on-demand transcoding on premise. The nice thing is that advances in one product usually fall back to the other shortly after. Their third product, Elemental ACT, is an upcoming cloud-based transcoding solution based on Amazon Cluster GPU Instances. Its exact features and roadmap are not known now but we can presume that it will compete with Sorenson’s solution for vod AND with Flumotion’s LiveTranscoding.com service for live – allowing an overflow on the cloud for both live and vod production after reaching on-premises capacity. Sounds great….
I had the chance to test the Elemental Server recently, and I can say that it sounds very promising. Of course it’s a young product compared to reference solutions, like Inlet Armada or Rhozet Carbon Coder, in terms of available workflows and options, but the performances due to CUDA acceleration are there, and it makes a huge difference if your main goal is encoding in a simple to medium complexity workflow. Being Linux based allows the solution to be customized in pre and post-process scripting with various scripts and binaries that you would like to use in your workflow. From my tests, what I can tell is that I could at best transcode 4 concurrent 1080 MP4 streams to 4 sizes/bitrates (720p and 3 other sizes to 2/3 of D1) in real-time – which is already a very nice achievement. I would say that this product is really gonna explode when the Elemental Conductor will allow to cluster Servers more efficiently than it is implemented right now (each Server being able to be a master node and a simple encoding node) – but it can already be a major element in your today’s transcoding platform design.

Microsoft Expression Encoder 4 Pro – finally GPU friendly

Expression Encoder interfaceSince Expression Encoder Pro version 4, Microsoft is integrating MainConcept CUDA SDK, thus allowing to offline a big part of the video encoding to the GPUs that you have on board. I’m not sure of what is the max number of CUDA cards supported on Windows, but if it’s possible (and well managed), it can surely be a very competitive solution when combined with CUDA dense servers like Carri’s ones or HP’s ones. It’s at least a good answer to the huge ressources demanded by the encoder.
The good point is that you can use Expression Encoder’s extensive API to build automations for VOD encoding and live streaming, the bad point is that it’s somehow limited in terms of input/output formats : it’s good to produce H.264/AAC-LC Smooth Streaming but that’s all. If you want to reach a wider audience, you must then combine this output with server side repackaging like the one IIS 7 does for HLS with its Transcode Manager plugin, or a more advanced one which repackages Smooth Streams in every other ABR technology, like Seawell Networks Spectrum.
In all cases, it’s a cheap and quick path to accelerated encoding and a good starting point for doing performance benchmarks and prototypes. Let’s have fun !

Sorenson Squeeze Server – the accelerated CLoUdSTER

Leveraging its long experience in transcoding with Squeeze, Sorenson launched a new high-volume encoding Squeeze Server product in late 2010. It’s the first and only offer that allows you to deploy an hybrid platform with on premise servers integrated with Sorenson servers in the cloud (that’s their Hybrid offer).
Their On Premise server offer is maybe the only one on the market to use a multi-node database for load balancing. The list of supported formats is simply impressive, they support all ABR technologies (HDS to be confirmed) and they have integrated Aspera’s solutions for accelerated file transfers. If you add the GPU acceleration and an extensive API to this picture, you get a solution which brings high pressure on leaders in the farm transcoding market like Rhozet, Inlet and Digital Rapids. As with Expression Encoder, the solution has to be tested with high density CUDA servers to see to which point we can push the performance. On the paper, it looks like the most advanced offer when considering the overall architecture – I’m really eager to benchmark it…

DSP/ASIC Acceleration Options

SUPPLIERMedia ExcelRGB NetworksDigigramSeawell NetworksHarmonic
MODELHero nSTransactAQORD *LINKLumen 1000ProStream 4000
ACCELERATION TYPEDSP (TI)DSP (TI)ASICASIC*ASIC
SOLUTION TYPEApplianceApplianceApplianceApplianceAppliance
ENCODING (LIVE)Yes-Yes*Yes
TRANSCODING (LIVE)YesYesYes*Yes
TRANSCODING (FILE)YesYes*Yes-
OUTPUT : H.264 AVCYesYesYes-Yes
OUTPUT : H.264 SVC*--Yes-
OUTPUT : 3D MVC*----
OUTPUT : MPEG-TSYesYesYes-Yes
ABR SUPPORT : Smooth StreamingYesYesQ4 2012-Yes
ABR SUPPORT : Adobe HDSYesYesYes-Yes
ABR SUPPORT : Apple HLSYesYesYes-Yes
DRM SUPPORTYesYes--Yes
APISOAPXML-RPCRESTREST*
CENTRAL MANAGEMENTYesYes--Yes
PRODUCT PAGEhttp://bit.ly/kCgsjvhttp://bit.ly/ijnGMshttp://bit.ly/P39Wl2http://bit.ly/k7mJydhttp://bit.ly/jitR4t
NOTES*software upgrade* TS container source file only* to be confirmed* API type unknown

Media Excel Hero nS – the hidden head-end gem

Media Excel Hero nSAlthough Media Excel is a 11 years old company, it has flewed under the radar for a long time. Nevertheless, they now offer – at least on the paper – one of the most (if not THE most) complete and versatile solution on the market. In fact “nS” stands for “nScreens”, which is their credo while offering several inputs and plenty of outputs per input, all of this powered by two DSP cards on the high-end model. This hardware provides a level of density that few other appliances do reach wit such low power consumption, but it also makes the configuration expensive to upgrade if newer DSP cards come to the market.
Right now, the Hero nS provides both live and vod encoding (but not at the same time, apart from the combo live-to-live combined with live-to-file for DVR uses) and extensive support for transport protocols and adaptive bitrate methods. It’s also the most integrated appliance with DRM systems such as PlayReady, Verimatrix, Widevine and AES for iOS.
Their marketing bottom line is that the appliance is ready for future formats like MVC, SVC, WebM – and it indeed sounds credible when we examine the wide range of implemented formats and features. They have a central command solution for the appliances so it reasonably looks like an industrial grade platform – just the kind of high-end tools we need to play with…

RGB Networks TransAct – the telco platform

RGB Networks TransAct PackagerFirst Ripcode was postionned on the live transcoding market, then on the just-in-time on-demand transcoding for mobiles, and now they seem to offer a more balanced platform with their 6th generation of appliances – now integrated into RGB Networks product portfolio in a pure head-end spirit with enterprise grade approach. The original architecture choice is that they finally cut the former product in two parts : the transcoder and the packager which takes the H.264 TS streams and does all the packaging operations and bridge to external DRM systems. While it seems a good idea as packaging is a heavily evolving operation in the overall streaming process, it introduces confusion on what are the exact features of the transcoder when he is working in a standalone mode. Output codecs are not so clearly outlined too – only input ones. Also confusing is the fact that they let the old Ripcode website live and that the product range and features sets are different – so I presume that the best solution to know for sure is to go spend some time on their booth @ next IBC…
Being integrated into RGB networks seems both an advantage and a drawback : advantage because of the wide range of equipments that can interact in the company and the engineering resources that can be deployed over a particular product, but a drawback for consumers because it dilutes Ripcode’s roadmap in a larger one and it leads to a more integrated approach where you have to buy more stuff than you need if you just want the transcoder.
Some years ago I did test the Ripcode unit (probably it was in v3) and I remember a somehow not so ergonomic command interface, and an intrinsinc working logic not easy to catch – it may be something normal for the telco market (as they implement their interfaces through APIs) – somehow not a very pleasant experience for casual techies. I need to see if this austere approach has changed now because their product promise looks solid.

Digigram AQORD *LINK – the disruptive appliance

Ecrin Livestream AIOFor many years, Ecrin (now a division of Digigram) has been a reseller of Viewcast products in Europe and has gained a wide range of clients in the streaming market. They are people of great video expertise and with a high entrepreneur vision. Three years ago, they decided to build their own range of encoder products. They did it from scratch, Linux based – and it rocks !
Actually they have developed two products in parallel : the Livestream (now AQORD *LINK), which is originally a low-latency contribution encoder (with ASIC card) for HD point-to-point or multicast distribution with FEC- and the IPLive (now AQILIM *FIT), which is a pure streaming software encoder. As both encoders share a common code base, they can easily add the streaming features of the IPLive to the Livestream. That’s what they are doing with the upcoming Livestream All In One (AIO) model which supports HLS, HDS and soon Smooth Streaming. This versatile contribution+streaming encoder does provide a pristine image quality and a very fluid video stream – and it can be used for VOD transcoding on top of live encoding/transcoding (although it’s not its strongest point). Its pricing is very agressive, although it reached major broadcast channels quality requirements.
In a previous company, I’ve been using the Livestream for contribution, and I can say for sure that it’s a rock solid product. The web GUI is very simple but very efficient, and now it has a basic API for integration. It’s indeed a young product, but the Ecrin team works hard on extending the ABR flavours support and they are listening closely to their clients. Right now the product is not deeply integrated with major CDNs specific architectures but it’s a perfect choice for working with alternative CDNs. When it will be Akamai/Limelight/Level3 certified, it will surely be a competitor for the Spinnaker – with some hardware acceleration on the backline. Much reasons to watch closely this product evolve !

Harmonic ProStream 4000 – it does them all

Harmonic ProStream 4000With years, Harmonic has been surely expanding its operating range from the broadcast to the streaming world. The acquisition of Rhozet was the first major step in this direction, but they didn’t stop to the VOD station, they made an equivalent effort for the live streaming with their ProStream 4000 product, which targets multiscreen delivery. While ASIC architecture provides high density to the appliance which does both live encoding and transcoding, the ABR and output protocol support are outstanding. Compared to competitors, they add the 3GPP support and IPTV extensive support – which gives an overall extremely versatile IP head-end equipment. The nature and possibilities of the API remain to be seen (no detail available). Its price tag is certainly high but it seems that it’s highly justified too…

Seawell Networks Lumen 1000 – the incoming SVC tsunami

Seawell Lumen 1000A bit apart in this panorama stands the Lumen 1000, the first H.264 SVC encoder which came on the market one year ago. Taking advantage of the SVC norm and its base layer completed by (up to 27) enhancement layers, the appliance generates multi-level quality streams in just one file. This is a very interesting perspective for streamlining video production pipelines and reducing the storage space needed with ABR technologies, because the resulting SVC file is only 20% larger than the highest bitrate.
At this point it’s still unclear when the appliance will support live streaming, what is the type of hardware acceleration that they are using and weither Seawell already managed to use the Lumen as an input to their very interesting Spectrum multi-ABR delivery platform (in fact it could already be released, see comments in the Spectrum chapter here). Once this last point will be confirmed, the Seawell platform will certainly provide the most efficient option on the market to serve multiples variations of ABR H.264 streams. Over time, we shall see SVC support natively added to players and this strip the need for server-side repackaging of the SVC streams. In this ideal SVC world, Seawell Networks will obviously be a major actor – and this is a very good reason to keep a close eye on their young but promising products.
For those DIY addicts who want to build their own SVC encoder prototype, they can do it now with the Stretchin SVC card – and I’d be curious to have your feedback if you have done it.

And what about hardware-accelerated WebM encoding ?

One year ago, Google launched WebM, based on (formerly On2) VP8 video codec and Vorbis for audio in a Matroska-based container. Since this time, WebM has received much interest but few large scale deployments apart from YouTube. Two major shortcomings must be cleared out before WebM becomes as popular as H.264 : hardware support for decoding and a decent ABR streaming protocol.
First point has started with TI showing the first implementation on OMAP 4 processor and second point is said to be of highly priority for the Google teams. It’s unclear at this stage if they will recycle Widevine’s ABR protocol or just go the standard way with MPEG-Dash.
On the encoding side, the need is basically to accelerate VP8 video encoding, and there are few options available right now, mainly because the reference implementation for ASIC has just bee released by the WebM Hardware project team. However there are alternative implementations ready or in the works, on the Tegra2 platform, on Videantis processors and on the Octasic OCT2224M processor. Now we have to wait a bit to see the software encoders which will take advantage of these silicon implementations – I couldn’t find a single one available yet…

Final words

Ateme Titan Live

Ateme Titan Live

There is one category of equipments not covered in this post, namely the Blade form-factor equipments, but they can also offer interesting options when it comes to maximize the place in the racks. In this area we can point out the TranSURF blade (DSP) and Ateme Titan Live and File FPGA-based products which show up an impressive feature list and options.

The last two technologies that need to be watched in the near future are OpenCL and Quick Sync. The first seems to be mainly targeting ATI hardware, and we should see professional encoders based on MainConcept OpenCL SDK popup sooner or later. The second one is a new architecture tailored for recent Intel processors and it sounds like an interesting alternative to CUDA, at least less power-consuming.

Now I hope that you have enjoyed reading this post and that it will be useful to you – spread the word about it !

– – – – –

WEBOGRAPHY
The reference author on this topic is definitely Tim Siglin.
Here is a collection of his researchs and insightful articles, great readings.
Back to Basics: Hardware Acceleration
Best Workflows 2010 – Encoding & Transcoding Solutions
Elemental Reveals Live Encoding Solution
Elemental Server Provides Enterprise-Class GPU Transcoding Acceleration
SeaWell Launches Lumen1000 Appliance
Elemental, Media Excel Plot 2011 Strategies
Beyond Software: Hardware Processing for Streaming at Scale (2021)

– – – – –

This post is the first of a series of very specific subjects of interest inside the wide Streaming Video domain. The purpose of these posts is not to benchmark the technologies or to provide tutorials, but rather to give an outlook of available technologies in each sub-domain, so that you could use it as a starting point for your RFPs, researches, architecture plans and maybe final platform implementations…

http://www.streamingmedia.com/Articles/Editorial/Featured-Articles/Elemental-Media-Excel-Plot-2011-Strategies-73415.aspx