Friday, 27 September 2013

Performance difference in Jython and Java implementations of a turbo decoder

Performance difference in Jython and Java implementations of a turbo decoder

I'm currently making a Computer Science thesis on a multithreaded software
implementation of a turbo decoder. The basic problem is as follows:
let (y1, ..., yn) be the sequence of noisy bit received through a channel.
Two SISO decoders work in parallel (each one receiving the first bit of
the sequence and the parity bit y1j, j in {1,2}) - the objective is to
compute the LLR (log-likelyhood ratio, which gives info on the probability
that the current bit is either 0 or 1), which is then fed to the other
SISO decoder for the next iteration. Suppose the received bits are split
into shorter frames of data. (SISO means soft input, soft output, beacause
each encoder gets an estimation of the bit probabilities in input and
outputs its own estimation)
Each computation of the LLR requires lots of serialized ACS operations,
depending on the amount of the bits in each frames (and on the amount of
bits used by the encoder to make the parity bits for the initial
sequence). Such computation can be summed with these nested for cycles
(for each one of the two SISO decoders working in parallel):
for i=1 to N_FRAMES:
for j=1 to N_ELS_FRAMES:
for k=1 to 4:
for l=1 to N_STATES:
do_ops()
Note that the above loop doesn't actually appear in the algorithm, but it
does match quite closely the operations made for each iteration.
Generally, N_STATES is around 12 or 24 (it depends on the amount of bits
each encoder uses to compute the parity bits on the input sequence) and
do_ops() requires a sum, a max and a vector normalization.
At first I tried to make an implementation using Jython, but the result
was pretty disheartening: the operations in the nested loops above took -
with the multithreaded version - around 20 minutes (!) with a moderate (~2
mil) amount of bits.
On the other hand, a Java implementation requires ~1 sec with ~4 mil bits
using the single-threaded version of the algorithm.
Why is there such a huge difference?

No comments:

Post a Comment