XD blog

blog page

python


2014-12-15 Convert a notebook into HTML from a notebook and others tricks

The notebook Convert a notebook into a document explains how to convert a notebook into a HTML page from the notebook itself. It also gives two others tricks: how to get the notebook filename and how to create a link to a local file. The second trick is quite interesting for those who use a remote notebook. It is easy to write code to download data from the notebook. The data will be placed on the remote machine. Once you've played with it, you usually get results or others data. But if the only access to this machine is through a notebook, you can add a link to this local data instead of upload this data somewhere on the cloud or a website. See also FileLink.

Some others unrelated tricks. I tried in the following notebook the new version of matplotlib which allows zoomable graphs in a notebook: Graphe zoomable avec Matplotlib (French). I still prefer mpl3d.

2014-12-07 A few tricks with matplotlib

On Windows, matplotlib crashes sometimes. I found this explanation and it worked for me. matplotlib crashing Python

plt.close('all')

The latest version 1.4.2 enables zooming in notebook: The nbagg backend. See also ipython notebook on linux VM running matplotlib interactive with nbagg.

Last trick: ggplot style with matplotlib.

import matplotlib.pyplot as plt
plt.style.use('ggplot')

2014-12-06 Build automation with Jenkins

For the past two months, I updated many times my website. I automated many things but it still takes a while. And my laptop needs to remain opens until it completes. I finally took the decision to spend some time on Jenkins. I use it to compile the documentation of my python modules, to run the unit tests, and maybe publish it to my website. Basically, you can run a command line batch by just clicking and receive an email when it is over or if it has failed. One interesting feature is the possibility to chain the commands. If the commands n succeeds, go to command n+1. You can connect it to GitHub or any source repository. It will update your clone just before building.

It takes less than an hour to install Jenkins, install the necessary plugins and creates the first job. I had difficulties because I was running my python script from a batch command file. The script was not stopping even after Python raised an exception. To do that, the following line must be inserted:

%pythonexe% -u setup.py unittests
if %errorlevel% neq 0 exit /b %errorlevel%

Jenkins is more than a scheduler, it handles security, it can run a script on a different machine.

2014-12-02 Petites subtilités avec les expressions régulières en Python

Je me souviens rarement de la syntaxe des expressions régulières. J'utilise beaucoup la fonction findall. A tort je crois.

text = """ a ab ab ab ab c a ab ab ab"""
exp = re.compile("a( ((ab)|(c)))+")
found = exp.findall(text)
for el in found:
    print(el)
(' c', 'c', 'ab', 'c')
(' ab', 'ab', 'ab', '')

Le premier élément de chaque ligne correspond au groupe inclus dans les premières parenthèses qui matche plusieurs sous-parties de la chaîne de caractères mais seule la dernière est conservée.

text = """ a ab ab ab ab c a ab ab ab"""
exp = re.compile("a(( ((ab)|(c)))+)")
found = exp.findall(text)
for el in found:
    print(el)
(' ab ab ab ab c', ' c', 'c', 'ab', 'c')
(' ab ab ab', ' ab', 'ab', 'ab', '')

Si on ajoute des parenthèses autour de l'expression répétées (donc incluant le signe +), on récupère toutes les sous-parties matchant le motif répété par +. Naïvement, j'ai pensé que je les aurais toutes dans des éléments séparés. Mais si l'expression régulières contient n groupes de parenthèses, on récupère des tuples de n éléments. Un autre code permet de récupèrer les positions.

exp = re.compile("a(( ((ab)|(c)))+)")
for m in exp.finditer(text):
    print('%02d-%02d: %s' % (m.start(), m.end(), m.groups()))
01-16: (' c', 'c', 'ab', 'c')
17-27: (' ab', 'ab', 'ab', None)

On se rend compte plus rapidement que quelque chose ne va pas.

2014-11-27 Some really annoying things with Hadoop

When you look for a bug, it could fail anywhere so you make assumptions on what you think is working and what could be wrong. I lost of couples of hours because I made the wrong one. I was preparing my teachings and I stored my data using Python serialization instead of json. That way, I could just use eval( string_serialized ). It already worked locally and on a first cluster hadoop when this instruction was embedded in python script. A PIG script was then calling it through streaming. I then tried the same instruction and it failed many times until I check this line was involved. And why? The error message was just not here to tell me anything about what happened. The script was just crashing. I suspect the line was just too long So, I'll do that another way. My first assumption were the schema I used in my Jython script. I finally chose to save that issue by considering only strings. I commented out line after line until this one out finally made my job work. That's the part of computing science I don't like. Long, full of guesses, impossible to accomplish without an internet connexion to dig into the vast amount of recent and outdated examples. And maybe in a couple of months, this issue will be solved by a simple update.

This night never happened. That's what I'll keep in mind. This night never happened.

2014-11-23 Install cvxopt on Ubuntu

The module cvxopt is not part of Anaconda distribution. If you try to run from a terminal:

pip install cvxopt

It is probably because Lapack and Blas are not installed. If it is the case, I suggest to follow the instructions in How to install Blas and Lapack. After it is done, the following instruction should run:

pip install cvxopt

On Windows, I suggest to go to Unofficial Windows Binaries for Python Extension Packages.

2014-11-16 Form in notebooks (IPython)

Sometimes I need to type some credentials to access a remote machine from my notebook. I could write them in a cell but then I would need to remove them from the notebook to avoid sharing them by negligence. I was using a simple fonction showing a window tkinter, a pop up. But this solution only works if the notebook server is local. When it is remote, the pop up windows appears on the remote machine and I cannot see it.

IPython allows javascript functions to execute some Python instructions which impact the workspace. All I had to do was to print a form in the page with some javascript. The result can be found here : Having a form in a notebook.

2014-11-14 Documentation for the module Azure in Python

I did find any documentation for the module azure-sdk-for-python so I generated it with Sphinx: azure-sdk-for-python (documentation).

2014-11-12 Notebook sur iPad

Après avoir installé un serveur de notebook sur une machine distante (voir Remote Notebook with Azure), j'ai voulu essayer depuis un iPad. Cela ne fonctionne pas à cause d'une erreur de WebSocket. Ca ne marche pas mieux avec Chrome (la solution proposée dans cette page ne marche pas). Il exite néanmoins l'application Computable pour iPad. Elle est gratuite pour essayer et payant pour créer ses propres notebooks. Elle utilise Python 2.7.

2014-11-11 SQLite in a Notebook with Magic Commands

Whenever I need to use SQLite, most of the time, I don't use Python because I don't do it often enough to remember the syntax. I usually use SQLiteSpy. However, converting any result to a dataframe is impossible unless I copy paste the results somewhere. So I implemented some magic commands in pyensae you can see in the notebook SQL Magic Commands with SQLite in a Notebook.

2014-11-09 Remote Notebook with Azure

For my teachings, I installed a notebook server on a virtual machine on Azure. All the students will be able to connect the same login (the multi-user configuration is part of the roadmap). The students will not have to install the notebooks by themselves. They will be able to see what other students users do. Here are the steps I followed.

Step 1: create the virtual machine with Azure

I won't detail that, it is pretty straight forward. Just follow the tutorial Create a Virtual Machine Running Windows. I chose a Windows Server. The number of cores must depend on the number of users. I assume all students are not going to access the notebook at the same time except during the lectures. I chose eight cores. I might modify this post in case it is not enough.

Step 2: install Python (latest version - 3.4 today)


more...

2014-10-30 Issue with some Sphinx themes and Internet Explorer

The theme Bootstrap does not work well on Internet Explorer. In that case, the file [python_path]/Lib/site-packages/sphinx/themes/basic/layout.html must be modified to include the following line:

<meta http-equiv="X-UA-Compatible" content="IE=edge" />

2014-10-23 Fix table of contents in a Notebook

When I have a long notebook, I find it difficult to navigate through it. Maybe I should not do that. However, some outputs are quite long sometimes and the page gets longer. I usually put a table of contents at the beginning but it means I need to go back to the top of the page anytime I want to go to a specific section. That's why I tried a table of contents in a section div with an absolute position (the notebook is here). It works nice even if the mix between markdown and HTML gives weird results sometimes. The only drawback is the conversion of the notebook into HTML or rst. The table of contents shows up at the top of the page at the expected place but disappears after the page was scrolled down.

2014-10-21 Pycrypto on Windows

I was looking for a way to build pycrypto on Windows. So I started to download Visual Studio Express 2010. But it requires to get MPIR and GMP. I gave up. Unfortunately, many versions are available at The Voidspace Python Modules, but nothing for Python 3.4. Fortunately, the rest can be found in a link mentioned at pycrypto 3.4 binaries for windows x86.

2014-10-17 Python Just In Time Compilation (JIT)

I discovered a new package to do just in time compilation for Python: HOPE. The following paper gives a promising benchmark compare to others alternatives: HOPE: A Python Just-In-Time compiler for astrophysical computations. It was not tested on Windows.


<-- -->

Xavier Dupré