Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing post-activation layer for GRU & LSTM when parsing JSON #124

Open
christoph-hart opened this issue Dec 30, 2023 · 3 comments
Open

Comments

@christoph-hart
Copy link

Hi there,

I'm currently toying around with the library and I noticed that the JSON parser for the tensor flow model does not add a activation layer after the GRU / LSTM models:

model->addLayer(gru.release());

I've compared the model layout of the model from your RTNeuralExample repository with the JSON and noticed that inconsistency. The tensorflow JSON does list a activation function as you can see here:

"type": "gru",
"activation": "tanh",
"shape": [ null, null, 8 ],

I'm just starting out with the entire ML stuff so it might be a silly question, but is there a reason for the activation layers to be omitted from the JSON parser for the GRU and LSTM layers?

@janaboy74
Copy link

janaboy74 commented Dec 30, 2023

I'm working on it. The parser is more or less fixed, but the output formatter is still wrong.

@jatinchowdhury18
Copy link
Owner

Hello!

So the "root" of the problem here is a bit of a "compatibility" problem between TensorFlow and RTNeural.

In TensorFlow (and I think PyTorch as well), the GRU and LSTM layers have their own "internal" activation functions. In RTNeural, we use the "default" tanh activation functions for these layers, and the activation functions are "built in" to the layer implementations.

The JSON files are typically generated from TensorFlow's representation of a model. It works something like this:

for layer in model.layers:
    layer_dict["activation"] = layer.activation

This way, when you define a TensorFlow layer with an activation function, the activation function will be part of the JSON file. However, for TensorFlow's GRU and LSTM implementations, layer.activation will return "tanh". Since the RTNeural implementations of these layers already include the "built in" activation functions, and we don't want to apply the activation function twice, we ignore the "activation" JSON field for those layers. The full Python script can be found here.

If you're manually writing/generating your own JSON file, and would like to have an additional activation applied after your GRU or LSTM layer, you could add another layer to your JSON file, that looks something like:

{
    "type": "gru",
    "activation": "tanh",
    "shape": [ null, null, 8 ],
}

Hope that this is helpful! I'm also curious about the fixes @janaboy74 is making for the parser?

@janaboy74
Copy link

I think I have fixed the jsonparser:
"- Json parser now uses recursion and I hope it works now correctly."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants