Introduction to Jupyter notebooks¶
Most of you will already know. But in order to use a package or library in a jupyter notebook, two requirements need to be fullfilled
You need to run a kernel with an environment that allows you to load this package. Formax-Mumott includes various packages that will allow you to run the following cells. They run in the jupyterhub server from MAXIV and in general also via port-forwarding and starting your own jupyter server. However, I experienced this to be not super stable.
You will need to import them here. For instance numpy, math package
[1]:
import numpy as np
What is really helpful is that you can use inline functionality to get a printout of function’s manuals/helps. Try this for instance
[ ]:
np.arange?
Docstring:
arange([start,] stop[, step,], dtype=None, *, like=None)
Return evenly spaced values within a given interval.
``arange`` can be called with a varying number of positional arguments:
* ``arange(stop)``: Values are generated within the half-open interval
``[0, stop)`` (in other words, the interval including `start` but
excluding `stop`).
* ``arange(start, stop)``: Values are generated within the half-open
interval ``[start, stop)``.
* ``arange(start, stop, step)`` Values are generated within the half-open
interval ``[start, stop)``, with spacing between values given by
``step``.
For integer arguments the function is roughly equivalent to the Python
built-in :py:class:`range`, but returns an ndarray rather than a ``range``
instance.
When using a non-integer step, such as 0.1, it is often better to use
`numpy.linspace`.
See the Warning sections below for more information.
Parameters
----------
start : integer or real, optional
Start of interval. The interval includes this value. The default
start value is 0.
stop : integer or real
End of interval. The interval does not include this value, except
in some cases where `step` is not an integer and floating point
round-off affects the length of `out`.
step : integer or real, optional
Spacing between values. For any output `out`, this is the distance
between two adjacent values, ``out[i+1] - out[i]``. The default
step size is 1. If `step` is specified as a position argument,
`start` must also be given.
dtype : dtype, optional
The type of the output array. If `dtype` is not given, infer the data
type from the other input arguments.
like : array_like, optional
Reference object to allow the creation of arrays which are not
NumPy arrays. If an array-like passed in as ``like`` supports
the ``__array_function__`` protocol, the result will be defined
by it. In this case, it ensures the creation of an array object
compatible with that passed in via this argument.
.. versionadded:: 1.20.0
Returns
-------
arange : ndarray
Array of evenly spaced values.
For floating point arguments, the length of the result is
``ceil((stop - start)/step)``. Because of floating point overflow,
this rule may result in the last element of `out` being greater
than `stop`.
Warnings
--------
The length of the output might not be numerically stable.
Another stability issue is due to the internal implementation of
`numpy.arange`.
The actual step value used to populate the array is
``dtype(start + step) - dtype(start)`` and not `step`. Precision loss
can occur here, due to casting or due to using floating points when
`start` is much larger than `step`. This can lead to unexpected
behaviour. For example::
>>> np.arange(0, 5, 0.5, dtype=int)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
>>> np.arange(-3, 3, 0.5, dtype=int)
array([-3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8])
In such cases, the use of `numpy.linspace` should be preferred.
The built-in :py:class:`range` generates :std:doc:`Python built-in integers
that have arbitrary size <c-api/long>`, while `numpy.arange` produces
`numpy.int32` or `numpy.int64` numbers. This may result in incorrect
results for large integer values::
>>> power = 40
>>> modulo = 10000
>>> x1 = [(n ** power) % modulo for n in range(8)]
>>> x2 = [(n ** power) % modulo for n in np.arange(8)]
>>> print(x1)
[0, 1, 7776, 8801, 6176, 625, 6576, 4001] # correct
>>> print(x2)
[0, 1, 7776, 7185, 0, 5969, 4816, 3361] # incorrect
See Also
--------
numpy.linspace : Evenly spaced numbers with careful handling of endpoints.
numpy.ogrid: Arrays of evenly spaced numbers in N-dimensions.
numpy.mgrid: Grid-shaped arrays of evenly spaced numbers in N-dimensions.
Examples
--------
>>> np.arange(3)
array([0, 1, 2])
>>> np.arange(3.0)
array([ 0., 1., 2.])
>>> np.arange(3,7)
array([3, 4, 5, 6])
>>> np.arange(3,7,2)
array([3, 5])
Type: builtin_function_or_method
Furthermore, it is very convenient to understand the concept of classes. numpy arrays are one example.
Take a look at the cell below:
Classes have a number of built-in function that are very handy, here are just a few examples.
You can get a dropdown menue of available options by typing example_array. and then press tab for suggestions. Getting accustomed to this can be very handy
[18]:
example_array = np.arange(0,10,1)
print(example_array, example_array.shape, example_array.argmax(), example_array.mean(), example_array.reshape(5,2))
# Prints out all options to use with example_array.
import re
#re.match?
#Print functions and variables from numpy array example array
#dir(example_array)
[func for func in dir(example_array) if not re.match('__',func)]
[0 1 2 3 4 5 6 7 8 9] (10,) 9 4.5 [[0 1]
[2 3]
[4 5]
[6 7]
[8 9]]
[18]:
['T',
'all',
'any',
'argmax',
'argmin',
'argpartition',
'argsort',
'astype',
'base',
'byteswap',
'choose',
'clip',
'compress',
'conj',
'conjugate',
'copy',
'ctypes',
'cumprod',
'cumsum',
'data',
'diagonal',
'dot',
'dtype',
'dump',
'dumps',
'fill',
'flags',
'flat',
'flatten',
'getfield',
'imag',
'item',
'itemset',
'itemsize',
'max',
'mean',
'min',
'nbytes',
'ndim',
'newbyteorder',
'nonzero',
'partition',
'prod',
'ptp',
'put',
'ravel',
'real',
'repeat',
'reshape',
'resize',
'round',
'searchsorted',
'setfield',
'setflags',
'shape',
'size',
'sort',
'squeeze',
'std',
'strides',
'sum',
'swapaxes',
'take',
'tobytes',
'tofile',
'tolist',
'tostring',
'trace',
'transpose',
'var',
'view']
Mumott has similar capabilities. Let’s take a look
[20]:
from mumott.data_handling import DataContainer
DataContainer?
Init signature: DataContainer(data_path: str, data_filename: str, data_type: str = 'h5')
Docstring:
Loads SAXSTT data from files, is used to apply transforms and corrections if needed,
and creates a :class:`ReconstructionParameters
<mumott.data_handling.ReconstructionParameters>` object.
Parameters
----------
data_path : str
Path of the data file relative to the directory of execution.
data_filename : str
File name of the data, including the extension.
data_type : str, optional
The type of data file. Supported values are ``h5`` (default, for hdf5 format)
and ``mat`` (for cSAXS Matlab format).
File: /data/visitors/formax/sw/envs/mumott-env/lib/python3.9/site-packages/mumott-0.2-py3.9.egg/mumott/data_handling/data_container.py
Type: type
Subclasses:
Let’s note here that it might be dangerous to access or change something within a class. However, this allows you to find out which functions are available for the function that I am dealing with
[24]:
# Create a datacontainer based on some data
path = '/data/visitors/formax/20220566/2023031408/process/reconstructions/implant_9425L_4w_XHP_NT/'
input_type = 'h5'
name = f'dataset_q_0.027_0.031.{input_type}'
try:
data_container = DataContainer(data_path=path,
data_filename=name,
data_type=input_type)
except FileNotFoundError:
print('No data file found!')
# To show all
#print(dir(data_container))
# Or rather to not reveal hidden functions/variables
[func for func in dir(data_container) if not re.match('__',func) ]
[24]:
['_angles_in_radians',
'_check_degrees_or_radians',
'_correct_for_transmission_called',
'_generate_parameter_representation',
'_generated_parameters',
'_h5_to_stack',
'_matlab_to_stack',
'_repr_html_',
'_stack',
'_transform_applied',
'angles_in_radians',
'correct_for_transmission',
'degrees_to_radians',
'reconstruction_parameters',
'stack',
'transform']
We also have a data structure that has minimum dependencies on where data has been recorded thanks to Mads implementation of this
I will try to make a class diagram to provide some structured idea on
At this point, there are also duplicated functions that are doing potentially almost the same. This is related to the modification that I started within an existing structure of Mads and we tried not to break functionalities from each other. In future, these might merge and simplify the code a bit
This class is not yet fully documented, but hopefully everybody can take a closer look also at the code and contribute here. This exercise can be very useful to (1) understand the code and (2) find potential bugs that might be hidden. It can also improve the quality of your own code by reading something written by someone else and understanding the idea
[26]:
# Structure
from data_processing.dataset import Dataset, Projection
# Beamline specific imports
from data_processing.ForMAX_utils import metadata_reader # from data_processing.cSAXS_utils import metadata_reader_cSAXS as metadata_reader
from data_processing.ForMAX_utils import transmission_loader
from data_processing.ForMAX_utils import scattering_loader_eiger
from data_processing.ForMAX_utils import create_args
create_args?
Signature:
create_args(
scan_id,
proposal,
visit,
sample_name=<class 'str'>,
air_id=None,
tomodata=True,
bkg_scan=None,
**kwargs,
)
Docstring:
Function to load arguments:
There are a couple of predefined arguments that are required, however, they can be overwritten
if given as **kwargs. This documentation should improve to allow people to read them more prominently.
For the moment, please check the args in the function definition within data_processing/ForMAX_utils.py
File: /gpfs/offline1/visitors/formax/20220566/2023031408/process/Software_repo_final/data_processing/ForMAX_utils.py
Type: function
[29]:
Dataset?
Init signature:
Dataset(
frameid_list,
metadata_sources,
transmission_loader,
scattering_loader,
**kwargs,
)
Docstring: The full dataset.
File: /gpfs/offline1/visitors/formax/20220566/2023031408/process/Software_repo_final/data_processing/dataset.py
Type: type
Subclasses:
[ ]: