-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Layers support & Input Data issues #50
Comments
Hello! Glad you're enjoying the library. For question 1, having more advanced layers would definitely be cool. I don't know much about the two layer types you mentioned here, except as high-level concepts. Would it be possible to share some articles about them, or (even better) some PyTorch or TensorFlow code that uses them in a neural net (preferably audio-related). For question 2, you should be able to do that sort of thing like this: float testInput[64] = { 0.0f, 1.0f, 2.0f, ... }; // create the input data
model->forward (testInput); // pass a pointer to the input data to the `forward()` method
float* testOutput = model->getOutputs(); // get a pointer to the output data |
Hey, thanks for your quick & helpful response! Your work is really impressive :) Regarding 1: I will go through some papers I read and send it to you the following days! About question 2: I followed your example code, but get strange results (some float values > 700) and the python model outcome is completely different. For test purposes, I am loading the model at run-time Therefore, I declared the model like in my PluginProcessor.h :
I used this to load the json file:
And this is how I am feeding the data into to model (I am using a buffer size of 64 in my DAW)
Is there something I'm doing wrong? Does it make a different if I declare the model architecture at compile time (except performance)? |
Hmm, the code in your post seems like ti should work... Would it be possible to provide an example JSON model file, as well as an example input and output? That would be useful for debugging. The run-time and compile-time models should give identical results, the only difference besides performance is in how the models are loaded. For the run-time models, the model architecture is determined based on the contents of the JSON file. With a compile-time model, the user defines the model architecture in the code, and if it does not match the model size defined in the JSON file then the output will be incorrect. In both cases, any errors in the model loading process should be printed to the console. Would it also be possible to share the console output coming from the call to One other thing to check is that the input size of the model is the same as the size of the input array, and that the input array is initialized either with values or zeros. If the input array is too small, or if some values are uninitialized, then the model output will be unpredictable. I know it seems simple, but I figured I'd mention it since I've run into this problem before. |
thanks again for your help! I really appreciate you taking the time to help me with this problem I just invited you to a private git repository, where I uploaded the json File, example audio input/output files (generated by the python model) and example input of an array of one's (note that this model is using now an input size of 144 (of course I updated the JUCE Plugin to this input size). Note: don't be irritated by the speaker of the audio files, I had the audio files lying around and then I used it as training data ;) The RTNeural Debug printout while loading the model looks like that (should be fine)
|
I know that the model is quite huge - I will definitely reduce the sizes of the layers at some but, it's just for a first prototype According to your last hint of passing in the right input_size, should be right:
|
Ah, the debug output from loading the model gives some idea of what's going on. The Exporting convolutional layers can be a little tricky since the way that TensorFlow/PyTorch handle the layer dimensions can be a little funky. The RTNeural unit tests use the convolutional network defined in this script, which results in this model file. Hopefully that example will help us get a better idea of where the dimensions disconnect is coming from. |
thanks for that hint! As I have experienced, it's not ethical to change the last index of the data tensor to 144. Instead our data tensor has the shape of :(batch_size = 4096, timesteps = 144, features = 1). Do you have any successions to proceed from here? I upload the python training script on the private github repository as model.py. But here is also an overview about the model:
|
Ah, that's interesting. The batch size shouldn't matter, since I'm assuming you only want to run one batch at a time when doing inferencing. The "timesteps" dimension is a little more interesting... I'm guessing those are consecutive samples in time? It looks like when you run the model in Tensorflow, the network output is an array of 144 values. If I'm understanding everything correctly, then I think the correct implementation with RTNeural would be something like this: void inference(float* output, const float* input) // both input and output are arrays of length 144
{
for(int i = 0; i < 144; ++i)
{
output[i] = model.forward(&input[i]);
}
} |
Shouldn't the inferencing method you proposed attempt a sample by sample forward propagation? I already tried to do it this way, but still not getting the expected result out (rather it sounds quite similar to the input). About the time step dimension, to answer your question, Yes, time step dimensions are comprised of consecutive samples. Is this the kind of data structuring RTNeural supports? If not, should we structure our data another way? How would you structure and preprocess audio data to meet RTNeural requirements? In the example you provided, The layers have smaller dimensions compared to ours. For instance, we have outputs with shape: Furthermore, another thing to note, is that the LSTM layer is outputting a different shape. A scenario I didn't came across in the examples. Do you think this is causing the problem in any sort of way? |
Hmm yeah, I'm a little bit stumped by this one. I don't think I've tried training a network before where the input goes directly into a convolutional layer, without going through something else first. (Frankly I'm using recurrent networks much more than convolutional these days anyway.) The fact that the LSTM layer has a 2D shape, rather than 3D is a little bit off-putting. I wonder if there's some internal "flattening" happening there. One other thing I noticed is that the Conv1D layer in your script is using the argument |
First of all, thanks for providing RTNeural. It is a really elegant way to get a model running in the audio c++ world!
I got two questions:
Is there any chance to see Conv1D Transpose and transformer Layers support in the future? It's quite a complex architecture..
The other one is probably a beginner question. I have a model with an input size of 64 Samples. Is there any way how to put in 64 Samples at a time and get 64 Samples back? The RTNeural examples and NeuralPi are all using an input size of 1
The text was updated successfully, but these errors were encountered: