I'm following the Statistical Methods for Machine Learning (StatML) course, where we are free to choose between R/Matlab/Python/C etc..

I'm currently using R, and I'm not liking it, so I considered switching to Python/numpy or maybe even Haskell. Does anyone have any experience with using Python or Haskell for this course, and if so, are there any downsides?

asked 16 Feb '12, 08:50

dolle's gravatar image

dolle
13817
accept rate: 100%

edited 17 Feb '12, 10:24

Sebastian%20Paaske%20T%C3%B8rholm's gravatar image

Sebastian Pa... ♦♦
86531133


I suggest sticking with R. If you have a CUDA-enabled machine -- switch to Matlab, they've recently added GPU acceleration.

The reason is that the algorithms in R and Matlab are seemingly very well implemented. I used Python for some time last year. However, I met some limitations wrt. matrix sizes, multiplications, and certain higher level methods. The problem is that at some point you ARE going to get slapped with A LOT of data and you have to be able to handle it. There'll be a more lightweight data set for the exam this year, but it's still fairly large.

The thing with NumPy looks to be in part, a phantom of the way Python estimates memory usage[0], and in part, something tuneable by the user[1]. Of course these can also be circumvented by smarter calculation methods; forcing you to be more creative than you have to be. Matlab just thrashesh your memory giving your matrix multiplications highest priority. In short, I tried Python last year and gave in to Matlab at the end. The benefit is also that Matlab is easier to get going with than the sometimes cryptic NumPy libraries (imo).

Christian Igel is otherwise very eager to promote C++/Shark [2]. You are certain to be able to do all the exercises (on your common 3Ghz/4GB machine) since Christian does everything in Shark himself, and seems to have a good habit of solving all the exercises that are handed out in StatML himself. However if you have little experience with C++, I don't imagine it being a particularly good idea.

[0] http://mail.scipy.org/pipermail/numpy-discussion/2007-May/027735.html

[1] http://mail.scipy.org/pipermail/numpy-discussion/2008-March/031987.html

[2] http://image.diku.dk/shark

link

answered 17 Feb '12, 22:47

oleks's gravatar image

oleks ♦♦
5201718
accept rate: 41%

edited 17 Feb '12, 22:49

1

I second the idea of sticking with R. It is, for stats, by far the most powerful open tool out there.

(18 Feb '12, 20:26) jlouis jlouis's gravatar image
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×122
×2
×1

Asked: 16 Feb '12, 08:50

Seen: 1,120 times

Last updated: 26 Feb '12, 16:23

powered by OSQA