Given a set of video frames, our method first computes frame-to-frame camera poses (left) and then re-renders the input video (right). To estimate pose between two frames, we compute off-the-shelf optical flow to establish 2D correspondences. Using single-view pixelNeRF, we obtain a surface point cloud as the expected 3D ray termination point for each pixel, X, Y respectively. Because X and Y are pixel-aligned, optical flow allows us to compute 3D scene flow as the difference of corresponding 3D points. We then find the camera pose P ∈ SE(3) that best explains the 3D flow field by solving a weighted least-squares problem with flow confidence weights W. Using all frame-to-frame poses, we re-render all frames. We enforce an RGB loss and a flow loss between projected pose-induced 3D scene flow and 2D optical flow. Our method is trained end-to-end, assuming only an off-the-shelf optical flow estimator.