Improve online execution

currently we have a bug in encoder_dynamics/run when executing any of the python files python -m train_model for example. No module can be imported, looks like something with ~ PATH variable is wrong (in different setups?!). This bug is reproducable in other branches (eg., #52 (closed) or #59 (closed)) as well as different envs (only python deps / nix env). Do you have any quick suggestion on this? Thanks!

You're running from project root? I can't reproduce

My guess is you guys had a setup where project root was added to python path, where my typical approach is to use -m from project root so the project root is added to path :)

Yes was from project root. Mhm ok this sounds very interesting I'll let you know if that solved it! We were just wondering that the bug occured on several setups.

For completeness, my pythonpath, and my holy reference for python imports if you want to understand whats happening You can also adjust an env variable by simply assigning it before your command. In this way you can adjust the PYTHONPATH and use a python without -m, see the last command here

Uh very nice thanks! Maybe I'll ask you for some more holy refernces ;)

created branch 59-improve-online-execution to address this issue

marked the checklist item Figure out what script was used to generate the data as completed

Hi @hanikevi, just a quick question. I want to train again on the Converging PC and built a nix environment with "develop .#ros1" but it is without using Cuda, so no training on gpu possible. If I try "develop .#jaxWithCuda" I get the error

On the latest flake, it should be .#ros1Cuda https://gitlab.cc-asp.fraunhofer.de/ipk_aut/techmodules/encoder_dynamics/-/blob/main/flake.nix?ref_type=heads#L80

Ok great! Will change that in the readme :)

Ahh sorry, yes the README is out of date. I just double checked, the nix develop cmd works on the Converging PC also with the ROS update, I can get the cuda in devices and create jax arrays.

Ok it worked fine yesterday but today I got this message:

marked the checklist item use script in git and change some waypoints as completed

changed the description

marked the checklist item jit the model apply fn (in ros_adetecor.py) as completed

marked the checklist item record nominal bags as completed

changed the description

mentioned in issue #60 (closed)

Hi @nik20652 @lis07128 , Niki explained yesterday that the AD is running smoothly, but doesn't cleanly distinguish the anomalies. The AD still acts as a low-pass or time delay, where external forces (e.g. pushing on it) is tracked well by the network output.

A few ideas:

My guess is that the force signal has pretty low information or entropy (it's mostly a low-frequency, slow-changing signal with white noise), so it might be too easy for the AE to compress the signal. Theoretically, for the tasks we consider, we should be able to use a latent dimension of 1 to reconstruct the forces (e.g. if the latent dimension was proportional to time). This might lead to a more task-specific network which gives more error during anomaly. I'm not sure an actual latent dim of 1 would even work, but I would also suggest to try reducing the latent dim as well as increase.
Do we have good quantitative benchmarks for the AD performance that we trust? To fairly compare different architectures, we need to make sure we can evaluate them in an easy, standard, quantitative way.
ros_datahandler also can convert /joint_states to EE pose/velocity. This might help learn a more specific task model. From Lagrangian mechanics, we theoretically expect the forces we measure to be a function of: robot position/velocity, environment configuration, and input torques. A radical change (more specific to this type of problem) would be to just do regression from robot position/velocity to force. However, this is not as transferable to other sensor types and would require more programming work, so I would just do this when we're sure the AE doesn't work well.

Hey @hanikevi ,

thanks for your suggestions. Currently I'm trying to verify the weights to confirm our assumption that our model just pass the inputs as outputs. However, I might had the same idea as you mentioned in 3. I understand this as training a predictive model that gets the last state s(t-1) to predict the current state s(t). During execution we process the last window to predict the current window. Subsequently, we could compare the predicted window with the current inputs. Did I got it right?

I understand you mean a prediction model instead of a reconstruction model, right? That would be pretty easy to test because we have prediction setup for the tactile stuff Valentyn is doing. The predictoin/forecasting models did better than reconsruction on the semi-supervised approaches compared in this paper (see Table 1).

What I meant in the 3rd point is rather: we make a direct regression model

F_{ext} = r(q, \dot{q})

, which is not taking the historical force inputs. Would be a radical change in problem formulation, I would first consider if nothing else works :)

Ah okay I see.

Considering your question, yes right. Training the prediction model should be easier b/c our model is forced to make a 'real' guess. And yeah this should be pretty easy to test

marked the checklist item latent dimension (~ order of magnitude) as completed

marked the checklist item window_size (~ order of magnitude) as completed

marked the checklist item record anomaly bags as completed

marked the checklist item tune model: as completed

marked the checklist item Consider all modalities (add joints) as completed

mentioned in merge request !43 (merged)

mentioned in commit 6c053330

closed with merge request !43 (merged)

mentioned in issue #64 (closed)

Improve online execution

Designs

Child items ...

Activity