function outObj = GPUTypeFromToGL(cmd, inObj, glObjType, outObj, keepmapped, mapflags)
% outObj = GPUTypeFromToGL(cmd, inObj [, glObjType][, outObj][, keepmapped][, mapflags])
%
% Note: Calling this command requires calling the following command first
% to initialize Psychtoolbox GPU computing support:
%
% PsychImaging('AddTask', 'General', 'UseGPGPUCompute', 'GPUmat');
%
% Supported 'cmd' commands:
% -------------------------
%
% if cmd is zero, then convert an OpenGL object of type glObjType,
% referenced by handle inObj into a GPU object and return it in outObj. If
% the optional outObj is provided as input argument, try to recycle it --
% just fill its content with OpenGL object's content. Otherwise, create an
% outObj of matching format for content. If 'keepmapped' is set to 1, the OpenGL
% object will stay mapped for the GPU compute api, otherwise it gets immediately
% unmapped after the conversion. Keeping the object mapped is more efficient, but
% requires more careful management of objects to prevent malfunctions.
%
% If cmd is == 1, then convert GPU object inObj to OpenGL object of type
% glObjType and return it in outObj. Try to recycle a passed in outObj, if
% possible, otherwise create a new one. 'keepmapped' - see explanation for cmd zero.
%
% If cmd is == 2, then unmap the OpenGL object. You must do this if you previously
% set the optional 'keepmapped' flag to 1 during a copy operation and now want to
% use the object which was the source or the target of that copy again with OpenGL
% or Screen(), ie., with a Psychtoolbox drawing or image processing function.
% Unmapping is neccessary for proper OpenGL operation, but costs a fraction of a
% millisecond of overhead on well working operating systems like Linux. Clever use
% of the 'keepmapped' flag and this manual unmapping method sometimes allows to
% save some redundant unmap calls.
%
% If cmd is == 3, then remove the OpenGL object from use by the GPU compute toolkit.
% This must be done before destroying/deleting the OpenGL object, e.g., before
% a call to Screen('Close', x); for a window or texture handle x. This operation
% can be very expensive -- on the order of multiple milliseconds, so use sparingly.
%
% If cmd is == 4, all OpenGL objects are removed. Usually used before closing (all)
% onscreen windows, e.g., via Screen('CloseAll') or sca. This cache flush is very
% expensive!
%
% If cmd is == 5, then the given OpenGL object 'inObj' of type glObjType is mapped
% and a CUDA memory pointer is returned, for use with external mex files, so these
% can directly access the mapped resource. The object is mapped read-only.
%
% If cmd is == 6, the same operation as cmd == 5 happens, but the object is mapped
% write-only.

% glObjType == 0 (default): Provided OpenGL object is a Psychtoolbox
% texture or offscreen window handle.
%
% glObjType == 1: Provided inObj is a struct which defines the low-level
% OpenGL object, which can be a texture or a renderbuffer. The struct must
% have the following fields:
%
%     texstruct.glhandle     == OpenGL object handle.
%     texstruct.gltarget     == Target: texture target or renderbuffer.
%     texstruct.width        == Width in texels/pixels.
%     texstruct.height       == Height in texels/pixels.
%     texstruct.bpp          == Bytes per texel/pixel/element.
%     texstruct.nrchannels   == Number of layers / color channels.
%
% -> glObjType 1 can be used in places where no calls to Screen() functions
%    are allowed or possible, e.g., inside the imaging pipeline, or 3rd
%    party low-level OpenGL code.
%
%
% Note: If you pass in a Psychtoolbox texture, it should be already in
% normalized orientation (upright and in row-major format). This is a given
% if the texture was created via Screen('SetOpenGLTexture') or
% Screen('SetOpenGLTextureFromMemPointer'); or if your texture is actually
% an offscreen window created via Screen('OpenOffscreenWindow'). If your
% texture is created via Screen('MakeTexture') you need to usually set the
% optional 'textureOrientation' flag to 1, unless you've pretransposed the
% Matlab/octave image matrix (setting of 2 is fine), or it is entirely
% isotropic (setting of 3 is fine). If you get your texture from a movie
% file, you need to pass the optional 'specialFlags1' parameter in
% Screen('OpenMovie') as 16.
%
% If you get the texture from the video capture engine, you need to pass
% the optional 'recordingflags' to 2048 in a call to
% Screen('OpenVideoCapture').
%
%
% glObjType == 2: Read or write from/to current virtual framebuffer for a
% given onscreen window handle passed as 'inObj', specifically the 1st
% color attachment.
%
%
% glObjType == 3: Read or write from/to currently bound FBO, specifically
% the 1st color attachment. 'inObj' or 'outObj' doesn't really have a
% meaning here, as we always query the current binding.
%
%
% Current Limitations:
%
% Currently only supports the GPUmat toolbox as hard-coded backend:
% (http://sourceforge.net/projects/gpumat/)
%
% In the future it should support more GPGPU backends and allow dynamic
% detection and/or runtime selection of backends. Possible candidates are,
% e.g., AccelerEyes "Jacket", low-level CUDA or OpenCL, and other toolkits
% based on CUDA or OpenCL, as well as our own to-be-done backed.
%
% Only really supports 32 bpc floating point precision textures and
% renderbuffers. This because this single precision float format is the
% only format common to both OpenGL and our one and only GPUmat backend.
% One can provide RGBA8 4-layer textures/renderbuffers, but these will be
% interpreted by the backend as single layer (luminance) single precision
% float matrix. Special CUDA kernels would be required in GPUmat to depack
% each apparent float pixel into a RGBA8 interleaved pixel for meaningful
% processing. Otherwise hilarious results will ensue.
%
% CUDA-5.0 interop as used by GPUmat currently only supports 1-layer,
% 2-layer and 4-layer textures and renderbuffers, ie., L, LA and RGBA, but
% not RGB format.
%

% History:
% 30.01.2013  mk  Written.
% 15.04.2013  mk  Require use of PsychImaging(..., 'UseGPGPUCompute', ...);
%

global GL;
persistent initialized;

% This global variable signals if a GPGPU compute api is enabled, and which
% one. It gets initialized by PsychImaging() if usercode requests GPGPU
% compute support: 0 = None, 1 = GPUmat.
global psych_gpgpuapi;

if isempty(initialized)
    % Make sure GPGPU computing got enabled by PsychImaging and GPU api
    % type 1, the GPUmat toolbox, is in use:
    if isempty(psych_gpgpuapi) || (psych_gpgpuapi ~= 1)
        error('GPGPU computing via GPUmat toolbox not enabled! Aborted.');
    end
    
    if isempty(GL)
        InitializeMatlabOpenGL([], [], 1);
    end
    
    % Ready to rock!
    initialized = 1;
end

if nargin < 1 || isempty(cmd) || ~isscalar(cmd) || ~isnumeric(cmd)
    error('Missing or invalid minimum required argument "cmd".');
end

switch cmd
    case 0,
        % Copy from OpenGL to GPU backend:
        direction = 0;
    
    case 1,
        % Copy from GPU backend to OpenGL:
        direction = 1;
        
    case 2,
        % Unmap object from cache:
        outObj = 1;
        direction = 0;

    case 3,
        % Purge object from cache:
        outObj = 1;
        direction = 0;
        
    case 4,
        % Cache invalidate:
        outObj = 1;
        memcpyCudaOpenGL(0);
        return;
    
    case 5
        % Retrieve mapped pointers for reading from OpenGL:
        direction = 0;
        
    case 6
        % Retrieve mapped pointers for writing to OpenGL:
        direction = 0;
        
    otherwise
        error('Invalid cmd specified.');
end

if nargin < 2
    error('Missing required 2nd argument "inObj".');
end

if nargin < 3 || isempty(glObjType)
    % Psychtoolbox "classic" texture handle or offscreen window handle:
    glObjType = 0;
end

% No outObj provided for recycling?
if nargin < 4
    outObj = [];
end

if nargin < 5 || isempty(keepmapped)
    keepmapped = 0;
end

if nargin < 6 || isempty(mapflags)
    mapflags = -1;
end

if direction == 0
    % OpenGL -> GPU => gpu is outObj, if any.
    gpu = outObj;
else
    % GPU -> OpenGL => gpu is inObj:
    gpu = inObj;
    if isempty(gpu)
        error('Empty GPUtype inObj variable provided, but update of OpenGL object requested. How is this supposed to work?!?');
    end
    
    % Impedance matching code. Try to massage input 'gpu' variable into a
    % format that is compatible with CUDA-OpenGL interop and OpenGL itself:
    switch ndims(gpu)
        case 1,
            % One dimensional vector: Turn into "row-vector"
            % style single texel row luminance texture:
            d1 = 1;
            d2 = numel(gpu);
            cc = 1;
            % Reshape into 3-D matrix with two singleton dimensions
            % to checks further down don't fail due to
            % size/dimension mismatch:
            gpu = reshape(gpu, 1, d1, d2);
        case 2,
            % Two dimensional matrix: Turn into luminance texture,
            % the 2D size of the two non-singleton dimensions:
            d1 = size(gpu, 2);
            d2 = size(gpu, 1);
            cc = 1;
            
            % Reshape gpu into a 3D matrix with the 1st dimension
            % being a singleton dimension which represents the
            % single luminance channel. We need this so that
            % further checks in the code below don't fail:
            gpu = reshape(gpu, 1, d2, d1);
        case 3,
            % 3D matrix, hopefully a width x height x channels
            % matrix with x = 1 to 4 channels.
            d1 = size(gpu, 3);
            d2 = size(gpu, 2);
            cc = size(gpu, 1);
            
            % Reject any zero-channel textures or more than 4
            % channel RGBA textures for now:
            if cc < 1 || cc > 4
                error('Provided 3D input GPU matrix "inObj" has less than 1 or more than 4 elements in 1st dimension, which would result in an unsupported color channel count of < 1 or > 4!');
            end
            
            % 3 channel input? This would translate into a 3
            % channel RGB texture, but at least CUDA-5.0 does not
            % support this. Extend it into a 4 channel format:
            if cc == 3
                % We add a value 1.0 to the fourth channel,
                % resulting in a alpha channel of 1 for fully
                % opaque:
                oldgpu = gpu;
                gpu = zeros(4, d2, d1, GPUsingle);
                gpu(1:3, :, :) = oldgpu(1:3, :, :);
                gpu(4  , :, :) = 1.0;
                cc = 4;
                
                % Give a performance warning:
                warning('GPUTYPEFROMTOGL:gputypeRGBtoRGBAautocast', 'Input GPU 3D matrix inObj has 3 elements in 1st dimension, which would result in a unsupported RGB texture! Extending to RGBA texture with A=1.');
            end
            
        otherwise,
            error('Input argument N-d matrix inObj has more than 3 dimensions. This is unsupported for conversion to OpenGL objects!');
    end
    
    % Hopefully 'gpu' is now safe to convert into a OpenGL object.
end

% GL object is a Psychtoolbox texture/offscreen window handle?
if glObjType == 0
    % Yes.
    if direction == 0
        % OpenGL -> GPUtype conversion:
        texid = inObj;
        
        if isempty(texid)
            error('No valid Psychtoolbox OpenGL input object provided!');
        end
    else
        % GPUtype -> OpenGL conversion:
        texid = outObj;
        if isempty(texid)
            % No existing texture object provided as output destination.
            % Create a 32 bpc float texture of matching format.
            for win = Screen('Windows')
                if Screen('WindowKind', win) == 1
                    break
                end
            end
            
            if isempty(win) || Screen('WindowKind', win) ~= 1
                error('No onscreen window opened. This does not work without at least one open onscreen window.');
            end
            
            % Create a 32 bpc float texture 'float = 2', with no need for
            % orientation swap (transpose) 'textureOrientation = 3). We
            % assume all buffers derived from the GPU backend are always in
            % upright row-major format, like Offscreen windows. Input code
            % therefore must do needed conversions.            
            texid = Screen('MakeTexture', win, zeros(d1, d2, cc), [], [], 2, 3);
        end
    end
    
    % Get OpenGL texture handle and texture target of underlying OpenGL
    % texture for given Psychtoolbox object handle:
    [gltexid, gltextarget] = Screen('GetOpenGLTexture', texid, texid);
    
    % Retrieve width and height of texture in texels:
    [width, height] = Screen('Windowsize', texid);

    glPushAttrib(GL.TEXTURE_BIT);
    glBindTexture(gltextarget, gltexid);
    
    % Query bits per pixel:
    bpc = glGetTexLevelParameteriv(gltextarget, 0, GL.TEXTURE_RED_SIZE);
    if bpc == 0
      bpc = glGetTexLevelParameteriv(gltextarget, 0, GL.TEXTURE_LUMINANCE_SIZE);
    end
    
    bpp = bpc;
    bpp = bpp + glGetTexLevelParameteriv(gltextarget, 0, GL.TEXTURE_GREEN_SIZE);
    bpp = bpp + glGetTexLevelParameteriv(gltextarget, 0, GL.TEXTURE_BLUE_SIZE);
    bpp = bpp + glGetTexLevelParameteriv(gltextarget, 0, GL.TEXTURE_ALPHA_SIZE);
     
    % Number of channels == Bits per pixel bpp / Bits per component, e.g., RED channel:
    nrchannels = bpp / bpc;
    
    % Translate to bytes per pixel:
    bpp = bpp / 8;
        
    glBindTexture(gltextarget, 0);
    glPopAttrib();
    
    % fprintf('nrchannels = %i  : Byteperpixel = %i  : bpc = %i\n', nrchannels, bpp, bpc);
    
    % Override a detected RGB32F format to become a 4-channel RGBA32F format. Why?
    % Because at least NVidia on Linux silently allocates storage for RGBA32F when asked
    % for RGB32F, ie., it allocates essentially a RGBX32F format with padding. The problem
    % is that the system lies about this and reports internal format as RGB32F and bpp
    % bits per pixel as 96 bpp instead of the real 128 bpp. This would cause us to misallocate
    % memory and copy the wrong amount, leading to incomplete damaged data transfers. We try
    % to work around this special case by faking the real format and just hope for the best...
    if (bpc == 32 && nrchannels == 3)
      nrchannels = 4;
      bpp = 16;
    end    
end

% GL object is a struct with OpenGL object handle, target, and other info?
if glObjType == 1
    if direction == 0
        % OpenGL -> GPUtype conversion:        
        texstruct = inObj;
        if isempty(texstruct)
            error('No valid Psychtoolbox OpenGL input object provided!');
        end
    else
        % GPUtype -> OpenGL conversion:
        texid = outObj;
        if isempty(texid)
            % No existing texture object provided as output destination.
            error('Creating an OpenGL object from a given GPU object type is not yet supported.');
        end
    end

    if ~isstruct(texstruct)
        error('No OpenGL info struct for inObj provided! Must be a struct!');
    end
    
    % Extract info:
    try
        gltexid = texstruct.glhandle;
        gltextarget = texstruct.gltarget;
        width = texstruct.width;
        height = texstruct.height;
        bpp = texstruct.bpp;
        nrchannels = texstruct.nrchannels;
    catch %#ok<CTCH>
        error('OpenGL info struct inObj is malformed or misses fields!');
    end
end

% Use currently bound drawBufferFBO if imaging pipeline is active --
% accessing the regular onscreen windows virtual framebuffer. inObj is a
% onscreen window handle:
if glObjType == 2
    if direction == 0
        % OpenGL -> GPUtype conversion:        
        win = inObj;
        if isempty(win)
            error('No valid Psychtoolbox onscreen window provided!');
        end
    else
        % GPUtype -> OpenGL conversion:
        win = outObj;
        if isempty(win)
            % No existing onscreen window provided as output destination.
            error('Creating a virtual framebuffer from a given GPU object type is not supported.');
        end
    end

    % Make sure inObj is a onscreen window, with imaging pipeline active
    % and in proper format:
    if Screen('WindowKind', win) ~= 1
        error('For glObjType 2, inObj must be a valid onscreen window handle. This is something else!');
    end
    
    % This queries window properties and binds the FBO for the onscreen
    % windows virtual framebuffer if it isn't already bound:
    winfo = Screen('GetWindowInfo', win);
    if ~bitand(winfo.ImagingMode, kPsychNeedFastBackingStore) || winfo.BitsPerColorComponent < 32
        error('For glObjType 2, onscreen window must have imaging pipeline enabled with a 32 bpc float framebuffer!');
    end
    
    % Proper FBO is bound. Query its color attachment zero, which is the
    % OpenGL handle of the attached texture or renderbuffer:
    gltexid = glGetFramebufferAttachmentParameterivEXT(GL.FRAMEBUFFER_EXT, GL.COLOR_ATTACHMENT0_EXT, GL.FRAMEBUFFER_ATTACHMENT_OBJECT_NAME_EXT);
    
    % Query type of attachment:
    gltextarget = glGetFramebufferAttachmentParameterivEXT(GL.FRAMEBUFFER_EXT, GL.COLOR_ATTACHMENT0_EXT, GL.FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE_EXT);
    
    % Texture?
    if gltextarget == GL.TEXTURE
        % Yes: We only support rectangle textures in the imaging pipeline,
        % so this is our final target:
        gltextarget = GL.TEXTURE_RECTANGLE_EXT;
    else
        % No: A renderbuffer:
        gltextarget = GL.RENDERBUFFER;
    end

    % Only 4 channel RGBA32F supported, aka 16 Bytes per pixel:
    bpp = 16;
    nrchannels = 4;
    [width, height] = Screen('Windowsize', win);
end

% Use currently bound OpenGL FBO, assuming the imaging pipeline is active
% and properly setup -- otherwise we'd crash or screw up.
if glObjType == 3
    % Proper FBO is hopefully bound. Query its color attachment zero, which
    % is the OpenGL handle of the attached texture or renderbuffer:
    gltexid = glGetFramebufferAttachmentParameterivEXT(GL.FRAMEBUFFER_EXT, GL.COLOR_ATTACHMENT0_EXT, GL.FRAMEBUFFER_ATTACHMENT_OBJECT_NAME_EXT);
    
    % Query type of attachment:
    gltextarget = glGetFramebufferAttachmentParameterivEXT(GL.FRAMEBUFFER_EXT, GL.COLOR_ATTACHMENT0_EXT, GL.FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE_EXT);
    if gltextarget == GL.FRAMEBUFFER_DEFAULT
        error('For glObjType 3, an OpenGL FBO must be bound, not the system default framebuffer, as here!');
    end
    
    % Query bits per pixel:
    bpc = glGetFramebufferAttachmentParameterivEXT(GL.FRAMEBUFFER_EXT, GL.COLOR_ATTACHMENT0_EXT, GL.FRAMEBUFFER_ATTACHMENT_RED_SIZE);
    bpp = 0   + bpc;
    bpp = bpp + glGetFramebufferAttachmentParameterivEXT(GL.FRAMEBUFFER_EXT, GL.COLOR_ATTACHMENT0_EXT, GL.FRAMEBUFFER_ATTACHMENT_GREEN_SIZE);
    bpp = bpp + glGetFramebufferAttachmentParameterivEXT(GL.FRAMEBUFFER_EXT, GL.COLOR_ATTACHMENT0_EXT, GL.FRAMEBUFFER_ATTACHMENT_BLUE_SIZE);
    bpp = bpp + glGetFramebufferAttachmentParameterivEXT(GL.FRAMEBUFFER_EXT, GL.COLOR_ATTACHMENT0_EXT, GL.FRAMEBUFFER_ATTACHMENT_ALPHA_SIZE);
     
    % Number of channels == Bits per pixel bpp / Bits per component, e.g.,
    % RED channel:
    nrchannels = bpp / bpc;
    
    % Translate to bytes per pixel:
    bpp = bpp / 8;

    % Texture?
    if gltextarget == GL.TEXTURE
        % Yes: We only support rectangle textures in the imaging pipeline,
        % so this is our final target:
        % TODO FIXME MK: Technically not quite correct, as at least
        % Screen('TransformTexture') could also use a GL_TEXTURE_2D target
        % instead of texture rectangle. However, this is a seldomly used
        % special case and i don't know at the moment how to find out which
        % target is actually used.
        gltextarget = GL.TEXTURE_RECTANGLE_EXT;
        
        % Query size width x height of texture image:
        glPushAttrib(GL.TEXTURE_BIT);
        glBindTexture(gltextarget, gltexid);
        width  = glGetTexLevelParameteriv(gltextarget, 0, GL.TEXTURE_WIDTH);
        height = glGetTexLevelParameteriv(gltextarget, 0, GL.TEXTURE_HEIGHT);
        glBindTexture(gltextarget, 0);
        glPopAttrib();
    else
        % No: A renderbuffer:
        gltextarget = GL.RENDERBUFFER;
        glBindRenderbuffer(gltextarget, gltexid);
        width  = glGetRenderbufferParameteriv(gltextarget, GL.RENDERBUFFER_WIDTH);
        height = glGetRenderbufferParameteriv(gltextarget, GL.RENDERBUFFER_HEIGHT);
        glBindRenderbuffer(gltextarget, 0);
    end
end

% Unmap or Unregister object from cache?
if cmd == 2 || cmd == 3
    memcpyCudaOpenGL(cmd - 1, gltexid, gltextarget);
    return;
end

% Map OpenGL resource, then return a memory pointer in a uint64 for it?
if cmd == 5 || cmd == 6
    % This maps the resource and returns a pointer to it in uint64 outObj:
    % cmd 5 and 6 are translated into direction values 0 and 1 via 'cmd - 5'. This
    % is important to get the correct mapping flags for resource mapping (readonly vs.
    % writeonly):
    outObj = memcpyCudaOpenGL(4, gltexid, gltextarget, 0, 0, cmd - 5, 1, mapflags);
    return;
end

if (nrchannels ~= 1) && (nrchannels ~= 2) && (nrchannels ~= 4)
    error('Tried to convert a 3 layer RGB texture or framebuffer. This is not supported.');
end

% Number of bytes to copy = w * h * bpp:
nrbytes = width * height * bpp;
if nrbytes < 1
    error('Tried to convert an empty texture. Forbidden!');
end

% Is an already existing 'gpu' variable provided for "refill" ?
if ~isempty(gpu)
    % Yes: Check for matching format. If it doesn't match, delete it, so it
    % can be recreated with matching format:
    if ~isa(gpu, 'GPUsingle') || (size(gpu, 1) ~= nrchannels) || (size(gpu, 2) ~= width) || (size(gpu, 3) ~= height) || (0 == getPtr(gpu))
        % No match, or not allocated. Destroy:
        clear gpu;
        % And create empty for recreation below:
        gpu = [];
        
        % This must not happen in GPU -> OpenGL mode:
        if direction == 1
            error('Incompatible GPUtype inObj variable provided for update of OpenGL object. How is this supposed to work?!?');
        end
    end
    
    % Is gpu a complex matrix - which we can't handle?
    if ~isreal(gpu)
        % Yes: Only extract and convert real part:
        gpu = real(gpu);
        
        % Give data-loss / performance warning:
        warning('GPUTYPEFROMTOGL:gputypeComplexToRealcast', 'Input GPU matrix inObj stores complex numbers, which we cannot store! Throwing away the imaginary component of each complex number!');
    end
end

% Need to create a new gpu variable?
if isempty(gpu)
    % Yes: Create a new GPUsingle GPU variable:
    gpu = GPUsingle;
    % Set it to real format:
    setReal(gpu);
    
    % Set its size: We *must* double-cast the size vector here, because the
    % gpuType == 3 path delivers int32's and GPUmat doesn't like this at
    % all, punishing us with GPUallocVector failure, if we don't cast to
    % double():
    setSize(gpu, double([nrchannels, width, height]));
    
    % Allocate its CUDA backing memory:
    GPUallocVector(gpu);
end

% Retrieve CUDA memory pointer to it:
gpuptr = getPtr(gpu);
if gpuptr == 0
    error('Memory allocation on GPU failed!');
end

% Perform copy of image content from OpenGL texture into CUDA backing store:
memcpyCudaOpenGL(3, gltexid, gltextarget, gpuptr, nrbytes, direction, keepmapped, mapflags);

if direction == 0
    % OpenGL -> GPU:
    outObj = gpu;
else
    % GPU -> OpenGL:
    if glObjType == 0
        outObj = texid;
    else    
        outObj = outObj; %#ok<ASGSL>
    end
end

return;

end