Introduction to Jupyter notebooks

Most of you will already know. But in order to use a package or library in a jupyter notebook, two requirements need to be fullfilled

  1. You need to run a kernel with an environment that allows you to load this package. Formax-Mumott includes various packages that will allow you to run the following cells. They run in the jupyterhub server from MAXIV and in general also via port-forwarding and starting your own jupyter server. However, I experienced this to be not super stable.

  2. You will need to import them here. For instance numpy, math package

[1]:
import numpy as np

What is really helpful is that you can use inline functionality to get a printout of function’s manuals/helps. Try this for instance

[ ]:
np.arange?
Docstring:
arange([start,] stop[, step,], dtype=None, *, like=None)

Return evenly spaced values within a given interval.

``arange`` can be called with a varying number of positional arguments:

* ``arange(stop)``: Values are generated within the half-open interval
  ``[0, stop)`` (in other words, the interval including `start` but
  excluding `stop`).
* ``arange(start, stop)``: Values are generated within the half-open
  interval ``[start, stop)``.
* ``arange(start, stop, step)`` Values are generated within the half-open
  interval ``[start, stop)``, with spacing between values given by
  ``step``.

For integer arguments the function is roughly equivalent to the Python
built-in :py:class:`range`, but returns an ndarray rather than a ``range``
instance.

When using a non-integer step, such as 0.1, it is often better to use
`numpy.linspace`.

See the Warning sections below for more information.

Parameters
----------
start : integer or real, optional
    Start of interval.  The interval includes this value.  The default
    start value is 0.
stop : integer or real
    End of interval.  The interval does not include this value, except
    in some cases where `step` is not an integer and floating point
    round-off affects the length of `out`.
step : integer or real, optional
    Spacing between values.  For any output `out`, this is the distance
    between two adjacent values, ``out[i+1] - out[i]``.  The default
    step size is 1.  If `step` is specified as a position argument,
    `start` must also be given.
dtype : dtype, optional
    The type of the output array.  If `dtype` is not given, infer the data
    type from the other input arguments.
like : array_like, optional
    Reference object to allow the creation of arrays which are not
    NumPy arrays. If an array-like passed in as ``like`` supports
    the ``__array_function__`` protocol, the result will be defined
    by it. In this case, it ensures the creation of an array object
    compatible with that passed in via this argument.

    .. versionadded:: 1.20.0

Returns
-------
arange : ndarray
    Array of evenly spaced values.

    For floating point arguments, the length of the result is
    ``ceil((stop - start)/step)``.  Because of floating point overflow,
    this rule may result in the last element of `out` being greater
    than `stop`.

Warnings
--------
The length of the output might not be numerically stable.

Another stability issue is due to the internal implementation of
`numpy.arange`.
The actual step value used to populate the array is
``dtype(start + step) - dtype(start)`` and not `step`. Precision loss
can occur here, due to casting or due to using floating points when
`start` is much larger than `step`. This can lead to unexpected
behaviour. For example::

  >>> np.arange(0, 5, 0.5, dtype=int)
  array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
  >>> np.arange(-3, 3, 0.5, dtype=int)
  array([-3, -2, -1,  0,  1,  2,  3,  4,  5,  6,  7,  8])

In such cases, the use of `numpy.linspace` should be preferred.

The built-in :py:class:`range` generates :std:doc:`Python built-in integers
that have arbitrary size <c-api/long>`, while `numpy.arange` produces
`numpy.int32` or `numpy.int64` numbers. This may result in incorrect
results for large integer values::

  >>> power = 40
  >>> modulo = 10000
  >>> x1 = [(n ** power) % modulo for n in range(8)]
  >>> x2 = [(n ** power) % modulo for n in np.arange(8)]
  >>> print(x1)
  [0, 1, 7776, 8801, 6176, 625, 6576, 4001]  # correct
  >>> print(x2)
  [0, 1, 7776, 7185, 0, 5969, 4816, 3361]  # incorrect

See Also
--------
numpy.linspace : Evenly spaced numbers with careful handling of endpoints.
numpy.ogrid: Arrays of evenly spaced numbers in N-dimensions.
numpy.mgrid: Grid-shaped arrays of evenly spaced numbers in N-dimensions.

Examples
--------
>>> np.arange(3)
array([0, 1, 2])
>>> np.arange(3.0)
array([ 0.,  1.,  2.])
>>> np.arange(3,7)
array([3, 4, 5, 6])
>>> np.arange(3,7,2)
array([3, 5])
Type:      builtin_function_or_method

Furthermore, it is very convenient to understand the concept of classes. numpy arrays are one example.

Take a look at the cell below:

  • Classes have a number of built-in function that are very handy, here are just a few examples.

  • You can get a dropdown menue of available options by typing example_array. and then press tab for suggestions. Getting accustomed to this can be very handy

[18]:
example_array = np.arange(0,10,1)
print(example_array, example_array.shape, example_array.argmax(), example_array.mean(), example_array.reshape(5,2))

# Prints out all  options to use with example_array.
import re
#re.match?

#Print functions and variables from numpy array example array
#dir(example_array)
[func for func in dir(example_array) if not re.match('__',func)]

[0 1 2 3 4 5 6 7 8 9] (10,) 9 4.5 [[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]
[18]:
['T',
 'all',
 'any',
 'argmax',
 'argmin',
 'argpartition',
 'argsort',
 'astype',
 'base',
 'byteswap',
 'choose',
 'clip',
 'compress',
 'conj',
 'conjugate',
 'copy',
 'ctypes',
 'cumprod',
 'cumsum',
 'data',
 'diagonal',
 'dot',
 'dtype',
 'dump',
 'dumps',
 'fill',
 'flags',
 'flat',
 'flatten',
 'getfield',
 'imag',
 'item',
 'itemset',
 'itemsize',
 'max',
 'mean',
 'min',
 'nbytes',
 'ndim',
 'newbyteorder',
 'nonzero',
 'partition',
 'prod',
 'ptp',
 'put',
 'ravel',
 'real',
 'repeat',
 'reshape',
 'resize',
 'round',
 'searchsorted',
 'setfield',
 'setflags',
 'shape',
 'size',
 'sort',
 'squeeze',
 'std',
 'strides',
 'sum',
 'swapaxes',
 'take',
 'tobytes',
 'tofile',
 'tolist',
 'tostring',
 'trace',
 'transpose',
 'var',
 'view']

Mumott has similar capabilities. Let’s take a look

[20]:
from mumott.data_handling import DataContainer

DataContainer?
Init signature: DataContainer(data_path: str, data_filename: str, data_type: str = 'h5')
Docstring:
Loads SAXSTT data from files, is used to apply transforms and corrections if needed,
and creates a :class:`ReconstructionParameters
<mumott.data_handling.ReconstructionParameters>` object.

Parameters
----------
data_path : str
    Path of the data file relative to the directory of execution.
data_filename : str
    File name of the data, including the extension.
data_type : str, optional
    The type of data file. Supported values are ``h5`` (default, for hdf5 format)
    and ``mat`` (for cSAXS Matlab format).
File:           /data/visitors/formax/sw/envs/mumott-env/lib/python3.9/site-packages/mumott-0.2-py3.9.egg/mumott/data_handling/data_container.py
Type:           type
Subclasses:

Let’s note here that it might be dangerous to access or change something within a class. However, this allows you to find out which functions are available for the function that I am dealing with

[24]:
# Create a datacontainer based on some data
path = '/data/visitors/formax/20220566/2023031408/process/reconstructions/implant_9425L_4w_XHP_NT/'
input_type = 'h5'
name = f'dataset_q_0.027_0.031.{input_type}'

try:
    data_container = DataContainer(data_path=path,
                                   data_filename=name,
                                   data_type=input_type)
except FileNotFoundError:
    print('No data file found!')

# To show all
#print(dir(data_container))
# Or rather to not reveal hidden functions/variables
[func for func in dir(data_container) if not re.match('__',func) ]
[24]:
['_angles_in_radians',
 '_check_degrees_or_radians',
 '_correct_for_transmission_called',
 '_generate_parameter_representation',
 '_generated_parameters',
 '_h5_to_stack',
 '_matlab_to_stack',
 '_repr_html_',
 '_stack',
 '_transform_applied',
 'angles_in_radians',
 'correct_for_transmission',
 'degrees_to_radians',
 'reconstruction_parameters',
 'stack',
 'transform']

We also have a data structure that has minimum dependencies on where data has been recorded thanks to Mads implementation of this

  • I will try to make a class diagram to provide some structured idea on

  • At this point, there are also duplicated functions that are doing potentially almost the same. This is related to the modification that I started within an existing structure of Mads and we tried not to break functionalities from each other. In future, these might merge and simplify the code a bit

  • This class is not yet fully documented, but hopefully everybody can take a closer look also at the code and contribute here. This exercise can be very useful to (1) understand the code and (2) find potential bugs that might be hidden. It can also improve the quality of your own code by reading something written by someone else and understanding the idea

[26]:
# Structure
from data_processing.dataset import Dataset, Projection

# Beamline specific imports
from data_processing.ForMAX_utils import metadata_reader # from data_processing.cSAXS_utils import metadata_reader_cSAXS as metadata_reader
from data_processing.ForMAX_utils import transmission_loader
from data_processing.ForMAX_utils import scattering_loader_eiger
from data_processing.ForMAX_utils import create_args

create_args?
Signature:
create_args(
    scan_id,
    proposal,
    visit,
    sample_name=<class 'str'>,
    air_id=None,
    tomodata=True,
    bkg_scan=None,
    **kwargs,
)
Docstring:
Function to load arguments:
There are a couple of predefined arguments that are required, however, they can be overwritten
if given as **kwargs. This documentation should improve to allow people to read them more prominently.
For the moment, please check the args in the function definition within data_processing/ForMAX_utils.py
File:      /gpfs/offline1/visitors/formax/20220566/2023031408/process/Software_repo_final/data_processing/ForMAX_utils.py
Type:      function
[29]:
Dataset?
Init signature:
Dataset(
    frameid_list,
    metadata_sources,
    transmission_loader,
    scattering_loader,
    **kwargs,
)
Docstring:      The full dataset.
File:           /gpfs/offline1/visitors/formax/20220566/2023031408/process/Software_repo_final/data_processing/dataset.py
Type:           type
Subclasses:
[ ]: