Thursday, December 1, 2011

Please don't get this the wrong way, but I found a bug.

The Friendly Bug Report. Like watching a gorilla knit:

Hi The test case scenario is as follows:

Posted via email from brainstorming

Wednesday, November 30, 2011

Reversing little puzzles

This little puzzle was part of a CTF I didn't participate in. I attempted to solve it a couple days after it ended.

The 'flag' had to be forced out of a binary (on a Linux box), no clues given at all. First naive attempt? strings ./binary, which revealed only two strings, "you have to try harder to get the secret message!" and "THE SECRET MESSAGE IS:". Not much. After running the binary a few times, with various arguments, I decided to fire up gdb, while brute-forcing a numeric first argument in another shell..

Disassembling main() led to the following:

# 0x000000000040069c main:     push   rbp     mov    rbp,rsp     push   rbx     sub    rsp,0x68     mov    DWORD PTR [rbp-0x64],edi     mov    QWORD PTR [rbp-0x70],rsi     mov    BYTE PTR [rbp-0x60],0xee     mov    BYTE PTR [rbp-0x5f],0x67     mov    BYTE PTR [rbp-0x5e],0x76     mov    BYTE PTR [rbp-0x5d],0x67     mov    BYTE PTR [rbp-0x5c],0xdf     mov    BYTE PTR [rbp-0x5b],0x69     mov    BYTE PTR [rbp-0x5a],0x6c     mov    BYTE PTR [rbp-0x59],0x75     mov    BYTE PTR [rbp-0x58],0xc1     mov    BYTE PTR [rbp-0x57],0x69     mov    BYTE PTR [rbp-0x56],0x6c     mov    BYTE PTR [rbp-0x55],0x69     mov    BYTE PTR [rbp-0x54],0xc2     mov    BYTE PTR [rbp-0x53],0x66     mov    BYTE PTR [rbp-0x52],0x6b     mov    BYTE PTR [rbp-0x51],0x21     mov    BYTE PTR [rbp-0x50],0x8d     mov    BYTE PTR [rbp-0x4f],0x51     mov    BYTE PTR [rbp-0x4e],0x77     mov    BYTE PTR [rbp-0x4d],0x75     mov    BYTE PTR [rbp-0x4c],0xdf     mov    BYTE PTR [rbp-0x4b],0x28     mov    BYTE PTR [rbp-0x4a],0x6c     mov    BYTE PTR [rbp-0x49],0x65     mov    BYTE PTR [rbp-0x48],0xcc     mov    BYTE PTR [rbp-0x47],0x65     mov    BYTE PTR [rbp-0x46],0x38     mov    BYTE PTR [rbp-0x45],0x68     mov    BYTE PTR [rbp-0x44],0xcc     mov    BYTE PTR [rbp-0x43],0x7b     mov    BYTE PTR [rbp-0x42],0x38     mov    BYTE PTR [rbp-0x41],0x77     mov    BYTE PTR [rbp-0x40],0xc2     mov    BYTE PTR [rbp-0x3f],0x66     mov    BYTE PTR [rbp-0x3e],0x38     mov    BYTE PTR [rbp-0x3d],0x74     mov    BYTE PTR [rbp-0x3c],0xc5     mov    BYTE PTR [rbp-0x3b],0x6d     mov    BYTE PTR [rbp-0x3a],0x38     mov    BYTE PTR [rbp-0x39],0x50     mov    BYTE PTR [rbp-0x38],0xcc     mov    BYTE PTR [rbp-0x37],0x66     mov    BYTE PTR [rbp-0x36],0x77     mov    BYTE PTR [rbp-0x35],0x70     mov    BYTE PTR [rbp-0x34],0xd9     mov    BYTE PTR [rbp-0x33],0x61     mov    BYTE PTR [rbp-0x32],0x6b     mov    BYTE PTR [rbp-0x31],0x20     mov    BYTE PTR [rbp-0x30],0x9f     mov    BYTE PTR [rbp-0x2f],0x38     mov    BYTE PTR [rbp-0x2e],0x29     mov    BYTE PTR [rbp-0x2d],0x31     mov    BYTE PTR [rbp-0x2c],0x8d     mov    BYTE PTR [rbp-0x2b],0x4b     mov    BYTE PTR [rbp-0x2a],0x4c     mov    BYTE PTR [rbp-0x29],0x46     mov    BYTE PTR [rbp-0x28],0x8c     mov    BYTE PTR [rbp-0x27],0x28     mov    BYTE PTR [rbp-0x26],0x38     mov    BYTE PTR [rbp-0x25],0xa     mov    BYTE PTR [rbp-0x24],0x0     cmp    DWORD PTR [rbp-0x64],0x2     je     4007b0 <main+0x114>     mov    eax,0x0     call   400614 <gracefull_exit>     mov    rax,QWORD PTR [rbp-0x70]     add    rax,0x8     mov    rax,QWORD PTR [rax]     mov    rdi,rax     call   400508 <atoi@plt>     mov    DWORD PTR [rbp-0x18],eax     cmp    DWORD PTR [rbp-0x18],0x0     jg     4007d6 <main+0x13a>     mov    eax,0x0     call   400614 <gracefull_exit>     mov    ecx,DWORD PTR [rbp-0x18]     mov    edx,0x29665e1f     mov    eax,ecx     imul   edx     sar    edx,0x8     mov    eax,ecx     sar    eax,0x1f     mov    ebx,edx     sub    ebx,eax     mov    eax,ebx     imul   eax,eax,0x62f     mov    edx,ecx     sub    edx,eax     mov    eax,edx     test   eax,eax     je     40080a <main+0x16e>     mov    eax,0x0     call   400614 <gracefull_exit>     mov    ecx,DWORD PTR [rbp-0x18]     mov    edx,0x10776183     mov    eax,ecx     imul   edx     sar    edx,0x6     mov    eax,ecx     sar    eax,0x1f     mov    ebx,edx     sub    ebx,eax     mov    eax,ebx     imul   eax,eax,0x3e3     mov    edx,ecx     sub    edx,eax     mov    eax,edx     test   eax,eax     je     40083e <main+0x1a2>     mov    eax,0x0     call   400614 <gracefull_exit>     mov    edx,DWORD PTR [rbp-0x18]     lea    rax,[rbp-0x60]     mov    esi,0x3c     mov    rdi,rax     call   400646 <a52>     mov    DWORD PTR [rbp-0x14],eax     cmp    DWORD PTR [rbp-0x14],0xf8b83a03     je     400868 <main+0x1cc>     mov    eax,0x0     call   400614 <gracefull_exit>     mov    eax,0x4009b3     lea    rdx,[rbp-0x60]     mov    rsi,rdx     mov    rdi,rax     mov    eax,0x0     call   4004d8 <printf@plt>     mov    eax,0x0     add    rsp,0x68     pop    rbx     leave      ret        nop     nop     nop

So, what do we see? Two functions, gracefull_exit (full of grace, I assume) and a52; we can probably already guess what the former does, and the latter will have to be more closely examined. We also see a large buffer filled byte-by-byte with mostly unprintable characters, apparently to evade strings. To satisfy our curiosity, we disassemble those functions too:

# 0x0000000000400614 gracefull_exit:     push   rbp     mov    rbp,rsp     mov    rax,QWORD PTR [rip+0x2006c9]     mov    rdx,rax     mov    eax,0x400980     mov    rcx,rdx     mov    edx,0x32     mov    esi,0x1     mov    rdi,rax     call   400518 <fwrite@plt>     mov    edi,0x1     call   4004e8 <exit@plt>   # 0x0000000000400646 a52:     push   rbp     mov    rbp,rsp     mov    QWORD PTR [rbp-0x18],rdi     mov    DWORD PTR [rbp-0x1c],esi     mov    DWORD PTR [rbp-0x20],edx     mov    DWORD PTR [rbp-0x4],0x0     mov    rax,QWORD PTR [rbp-0x18]     mov    QWORD PTR [rbp-0x10],rax     jmp    400684 <a52+0x3e>     mov    rax,QWORD PTR [rbp-0x10]     mov    eax,DWORD PTR [rax]     mov    edx,eax     xor    edx,DWORD PTR [rbp-0x20]     mov    rax,QWORD PTR [rbp-0x10]     mov    DWORD PTR [rax],edx     mov    rax,QWORD PTR [rbp-0x10]     mov    eax,DWORD PTR [rax]     add    DWORD PTR [rbp-0x4],eax     add    QWORD PTR [rbp-0x10],0x4     mov    rdx,QWORD PTR [rbp-0x18]     mov    eax,DWORD PTR [rbp-0x1c]     cdqe       lea    rax,[rdx+rax*1]     cmp    rax,QWORD PTR [rbp-0x10]     ja     400665 <a52+0x1f>     mov    eax,DWORD PTR [rbp-0x4]     leave      ret   

gracefull_exit calls fwrite to print a message and exits. a52 seems to loop over a buffer and performing a xor with a value stored in [$rbp-0x20], so it's probably some decryption routine. Let's not deal with that just yet, and figure out the flow and how to manipulate it.

We start with a buffer in 0x7fffffffe0e0; it's being filled with what seems like garbage, and there's a good chance it will be decoded later by a52. Then, atoi() is called on the first command-line parameter and the result is checked for being positive. If so, we continue into this first (and only) interesting part:

    mov    edx,0x29665e1f     mov    eax,ecx     imul   edx     sar    edx,0x8     mov    eax,ecx     sar    eax,0x1f     mov    ebx,edx     sub    ebx,eax     mov    eax,ebx     imul   eax,eax,0x62f     mov    edx,ecx     sub    edx,eax     mov    eax,edx     test   eax,eax     je     40080a <main+0x16e>

What is happening here? If you try and symbolically map the operations made on the result of atoi() (some people call this "math"), you'll end up with an identity function. Weird, right? This 5-instruction part:

    mov    eax,ecx     sar    eax,0x1f     mov    ebx,edx     sub    ebx,eax     mov    eax,ebx

which is nothing more of a fancy way of saying mov eax, edx, is followed by a multiplication with 0x62f (decimal 1583) and it must equal the initial result from atoi(). The only magic, if you will, is contained in the first 5 instructions of our snippet which must seem pretty weird if you're not familiar with them. To better understand it, I suggest reading an excellent series of posts by ridiculousfish on the subject of division, starting here.

Putting all those together, this snippet is the equivalent of (in C):

if(!((param / 0x62f) * 0x62f == param)) { gracefull_exit(); }

which seems kind of crazy all by itself. The next snippet, respectively, performs the exact same operation for the constant 0x3e3 (decimal 995). Now it makes a little sense; the supplied parameter to the binary must fulfil those two conditions. Then, if successful, the code calls a52() with 3 arguments: the atoi() result, the address of the buffer and the buffer's length (0x3c). The return value from a52() is then compared for equality to 0xf8b83a03 and then the flag (I assume) is printed.

Alright, so we have a good starting point. Unfortunately, it's near the end. If we start calculating the parameter value we need to direct flow into a52(), we get 0x1808ad (the product of 1583 and 995). By running the executable with that parameter, though, we get to the end of the line. The flag is printed, and says: "Congratulations! Your team has won the Panoptis 2011 CTF!". Not too difficult. Turns out the return value of a52(), in case you care, is the result of a 32-bit accumulation of the xored words of the message; you can see it happening in add DWORD PTR [rbp-0x4],eax. The key for XOR in a52() is also the result from atoi().

A fun hour nonetheless. :-)

Posted via email from brainstorming

Monday, November 21, 2011

Scrum and other buzzwords

Alright. The time has come when I've been involved enough in project that are/were using Scrum.

I don't know about how you guys operate, but I don't have velocity all the time. In fact, most of the time my process goes like: think long and hard about how you're going to do the what, while fiddling around with some prototype code, rewriting and redesigning stuff if and when the real problems become clear (or as your "architect" shifts his perspective about intended system behaviour). In short, I usually don't appear to make any progress until, well, I do.

I could also patch up a prototype, ready for others and testing, but that's shit I only pull off when we're 5 days before a deadline. Also, when I'm pissed off.

So, regardless of the way you work, you get stuck in a scrum every morning. What have you done in the last 24 hours? "I have been thinking about the implementation of the flux regulator for the reporting subsystem". That usually means I've spent my day staring into space, taking notes and drawing boxes on paper while toyinh with some code. Looks of disapproval everywhere. "Err, I prototyped the Traffiker report distribution code and enhanced it with some logging". Now the looks are neutral or passive. Meanwhile, I'm drawing cartoons. I'm bad at drawing. My stickmen could easily be mistaken for flow charts. Stick that in your UML, I fucking dare you.

But I've never felt good about the humiliation of scrum, every single stupid morning.

Even when I have done actual work, like fixed a dozen NPDs I spotted and that annoying bug that's kept Network Verification on our tails for two months, I don't get the time to convey any useful information. "And you know that bug, the reason you get 50 mails a day? Yeah, I found the cause for that one and fixed it. It manifests in..". Mike, we don't have time for that, you can explain it after the scrum. Yet "after the scrum" is magically and constantly 20 minutes away..

Things are different the second or third time you do something. By then you know what you're doing; you're on a mission from god and whoever gets in your way is gonna lose. By that time you can also lock yourself in a basement and delete all the fucking meetings from your calendar. Get things done.

Then again, we allow burndown charts to be used against us, so..

I haven't seen Scrum "done right" so far. I'm told others have. Here's a caricature of how it usually goes for us:

  • The team will meet for at least 15 minutes, occupying a urinal if possible or otherwise disrupting traffic. The meeting will be held at a different time each day, to accomodate the scrum master's schedule.
  • We have goats and we have cows. The goats are generally uninvolved in the process, unless they have to get blamed for something. Cows must stand and talk (moo) about inconsequential nonsense, or about deeply boring esoteric technical details that mean nothing without an open vim window.
  • The Barn Manager must conduct the meeting wearing a hat, holding a baton but never in a tuxedo. Her job is to assemble the team, go around the room asking what has everybody had for breakfast, whether they've had their coffee (and if not, provide it) and after a long polite exhange of pleasantries, inquire as to how things are faring on our beloved project, if it's not too much trouble.
  • Cows' status reports are one of:
    • "Things are going great, Moo."
    • "That bug has been fixed. I committed the fix in the right svn branch. Moo."
    • "Our fix didn't work for all use cases. Moo."
    • "I'm trying to fix the mess that goat (point finger) created. Moo."
    • "Who made me a cow on this team anyway? I was way happier in XXX until I got dragged to help you sort your shit out. Moo."
  • At this phase, all goats stand around looking at each other, rolling their eyes at frequently at cows' utterances (like using SOAP for RPC, monkey-patching database access code instead of using the API, performance tests the results of which nobody will ever see, etc).
  • Cows also report what will they be working on, using a keystroke-mouseclick granularity. The assumption is no cow is ever blocked by anything (horn usage is mandatory, alternatively dual wield axes WoW-fury-warrior-style).
  • After the scrum is concluded everyone heads for the cafeteria for a good hour of rest.

Posted via email from brainstorming

Monday, October 31, 2011

No one's unhappy in Greece

You know, most of the things you'll ever hear about Greece are the usual stereotypical bullshit that arise from the human need to categorize everything. Most of it is also mostly true, which is why stereotypes exist in the first place.

My non-Greek friends have, or have had, those preconceptions about me and the place I live in. They tend to think we're happy people, living in a beautiful place where it's mostly parties and big fat weddings. We are tend to be thought of as morally loose. Or at least, if we do have a stick up our rear, it's not as far up as the French one. Basically if you stick the word "Mediterranean" before anything, it'll make it seem exotic and amazing. Countries, beaches, women, food..

and, you know, it's a stereotype because it used to be mostly true.

I don't know when it all started going to hell but I'll tell you this. I haven't heard a good joke for more than 9 months. Sure, I went through a breakup during those first two months, but the last thing I remember dropping on the floor from laughter (literally) was Sheldon Cooper knocking at Penny's door in a Flash costume. And that doesn't count as a joke.

I've constantly been saying that for anything to change in this god forsaken land, our mentality must change first. The people's representatives are acting in complete constrast to this otherwise obvious fact; they recently practically all but celebrated a debt reduction of 50% (not all debt). Meanwhile, poverty and unemployment are on the rise, and the cost of living here is heading in the exact opposite direction.

I really wish I could make a joke out of that.

Posted via email from brainstorming

Sunday, October 23, 2011

Monday, October 10, 2011

Reference counting and the probe effect

Recently I've had the pleasure of writing a Python extension in C. After putting it to use (it's used on our automated tests with Robot), people reported it consumed all available memory. I had only conducted some small scale testing, so I can't say I was too surprised to hear nightly runs failed spectacularly.

My tests were of the "let's verify this is happening as intended". For instance, "let's create this ssh connection and give this couple of commands, then verify the expected results". However, my extension was used in various very different ways, like "let's create these two ssh connections and run this bunch of commands on the first - those that succeed must also be given on the second, the others should be reported. Repeat multiple times in a single test suite".

Unexpected use of your code can lead to the most enlightening revelations about the system you work with; in this case, Python's reference counting. As you may or may not know, Python has two ways of managing memory, a decision which many see as a major design flaw:

  • reference counting - releasing memory when the reference counter reaches zero, ie. no references to an object exist
  • garbage collection - optional, on-by-default, a supplement to take care of cyclic references

In my case, the likely candidates for this ill behaviour were rather few. While many interesting program events can cause stuff to happen internally (function calls, returning from a function, the first execution in a file, raising an exception, deep recursion, etc.), I was given hints: deep recursion was a no-no, and the tests involved many yet short-lived iterations - two candidates of which my test cases had neither.

My extension also has this thing; it uses callbacks to refer to Python functions that are called on (for the moment) two occassions:

  1. A file is opened - its contents are inspected to decide if it's valid input and/or should be logged
  2. The results from certain commands are filtered and/or used by other Python functions

Shortly after giving up trying to make sense of valgrind's output, I thought I'd just take a look into the extension:

  • Putting debug statements in the dealloc functions of Python Type objects. This tells me when are they getting called, if at all. They will be called when a Py_DECREF() call on my Python object decrements its refcount to zero.
  • Next, I had to identify the objects that were suspect to be leaking (i.e. objects of which the reference counts were not reaching zero when I expected them to) and write a simple routine that dumps their refcounts. This routine can be  inside the C module or in Python - depending upon how the code is organized. In C you can find a PyObject's refcount by its member ob_refcnt. In Python you pass the object to sys.getrefcount() function. Calling this refcount dumping routine from various places in my code and monitoring how the reference count of these objects varies can help identify the "hotspots".
  • RTFM. Understanding the meaning of borrowed reference and new reference was critical. Soon, I would find the places where I was supposed to call Py_DECREF(), but didn't..

Turns out the memory leak was the combined effect of two things. The first one has to do with Python's C API, which applies the concept of ownership on a reference. Depending on the API function you call, you may or may not own the reference to the object you get. When you own an object, you must not forget to Py_DECREF() it or give it to something that will steal it. Likewise, when you do not own an object, you must Py_INCREF() it to keep it and possibly to use it during your function. The reference counts on some of my objects were astronomical - this is a strong indicator your error lies in a loop structure. This was rather easy to track and Py_DECREF() objects correctly.

The second one had to do with the way I structured my objects: some Python objects I was tracking (mostly functions) were kept as values in a dictionary, registered as callback handlers. I also registered all such objects upon instantiation in a pure Python class that was to be shared among several processes. This had two side-effects. One, it increases the reference count of the parent object - turns out that this object, immediately after allocation had a high refcount:

>>> object = MyClass() >>> print(sys.getrefcount(object)) 15

This came as a bit of a surprise, as I wasn't expecting an object with no more than ten callbacks to have so many references. The fact of the matter was, that I was not inspecting my objects like I did above. Instead, I was running small tests while my extension was instrumented like I mentioned above; in many cases, the code actually looked more like this:

for index in range(all):     sys.getrefcount(dict.values()[index])

Ding! The probing code creates a list of values, incrementing the reference count of the objects I'm tracking. Additionally, centralized registering of all the objects mysteriously increased the reference count - without bearing in mind the references Python keeps in a frame's .f_locals or .f_globals. It might not be necessary to change this, but taking those extra references into account helps.

This was my first real world experience with a reference counting memory manager. Everything I'd done before involved either manually managed or garbage collected languages/implementations, and I got a really good deal out of this experience. I suggest everyone to try and write their own Python extensions, however simple, as it can be so very revealing to a lot of interesting concepts.

Posted via email from brainstorming

Tuesday, October 4, 2011

Code reviews: turn that frown upside down

About a year ago, I started working in an environment where code reviews or other forms of feedback weren't considered important and weren't a regular practice.

After several months of infrequent code reviews, what are the biggest issues with introducing, maintaining and improving a code review process?

Aggressive Reviewers

This is perhaps the most common thing that comes up during code reviews. Of course, review is there for everyone's benefit; the codebase, the team and each one individually, and not to attack people personally or make them feel inadequate. If they stop putting their code up for review, either bad code will make its way into your product, or only the most trivial of code will be submitted, leading everyone to think that everything's good when in fact it isn't.

The fact of the matter is that we're all human, and we make mistakes. Often those are the result of accumulated pressure and requirements, incomplete problem knowledge or conflicting specifications, etc. We've all been there, and we've all looked at code we've written years ago and thought "Gawd, this sucks!".

Dealing with over-aggressive reviewers is easy. Taking one aside and criticizing their approach will resolve the issue quickly - most of the time people don't even realize how they're coming across when in fact it isn't their intention to appear aggressive.

"Victim" mentality

On the other hand is the code "owner" who considers any kind of critique to be an attack either on their skills or their person. Their attitude is usually defensive against any comments, instead of seeing a review as a learning opportunity. That isn't to say that all review comments are positive, of course, but one should not outright dismiss them.

"Protecting" your code doesn't help in any way; the resulting situation is bad code making its way into the codebase, faults remaining hidden, etc. No good can come out of no review.

Negativity

There is nothing wrong with looking at code and not having a bad thing to say! Nothing wrong with just saying "That looks good"!

Lack of an outcome

Code reviews are supposed to incite arguments and disagreements, because that's the best way of reaching useful conclusions. But to reach a conclusion, you need to reach a decision.

At some point there needs to be recognized who has the final say in things; a manager, a senior, someone explicitly responsible for code quality, whatever. Remember that you can't please everyone, and unfortunately that's true of code reviews as well.

Review is a constant process

Not all review comments have to be acted upon. Not all "Hey, this is cool! We should use this more often!" warranties a policy of reusing Good Code™, and not all negative comments have to be satisfied. However, if devs are seeing their comments, review after review, being ignored or not acknowledged, it can be terribly demoralising and poisonous for your team.

Being ignored leads to two things: less participation, less reaction. It devolves into a chain reaction that will eventually get you nothing done, and harm the dynamic of your group more than code review can ever help you. If somebody takes the initiative of "bumping" review comments for action or re-opening review issues for discussion a problem might get resolved soon enough.

Lack of focus

It's a common sight that people don't know what to review. You can recognize these occassions when the discussion starts to devolve into conformance to coding principles or indentation levels.

I've found a gem that works wonders here. I ask people what do they want to focus their time on during the review. Suggesting areas to focus gives people the initiative they might lack; if this is able to shift the conversation from "tabs vs spaces" to seven people looking hard over a critical piece of your application logic, you have succeeded. Next time, people will know what to look for and be more alert. System knowledge increases and they have more of a clue on what's important for your code, and what isn't and should be left alone.

Unrealistic expectations

Development has no silver bullets. No process, technique, or practice will take you to communication and software nirvana, and there is no way to stop all bugs at code review or prevent your code from "rotting".

If you start believing code review (or pair programming, testing, verification) will magically cause bugs to reveal themselves before shipping, or that everyone on your team will suddenly rise to the level of your highest achievers, YOU WILL FAIL.

Accept that code review has an overall positive effect on your work. Your team will get better and the quality of your product will improve, if you save your developers the disappointment that wishful thinking like above entails.

I'd really like it if I could put some ways we've reviewed code up here and describe where we went wrong and what benefit we had, but time's running out. Maybe some other time. :-)

Posted via email from brainstorming

Saturday, October 1, 2011

Setting up CUDA and Visual Studio 2010

I've never used Visual Studio before 2010, but I hear Microsoft has given it a complete facelift. A whole bunch of customizations in modules, custom build goodness, etc. Since I tried and use it together with CUDA 4, the information about setting it up is scattered all over the place, and I thought I'd explain the steps necessary to set it up to help other first-timers.

Here's what's involved here:

  • Visual Studio 2010 on Windows 7 x64
  • NVIDIA CUDA Toolkit 4
  • Windows SDK 7.1
  • NVIDIA GPU Computing SDK 4
  • NVIDIA Parallel Nsight 2.0

First off, I doubt the order of installation makes any difference, but for reference it was: VS 2010, CUDA toolkit, Windows SDK 7.1 (Visual Studio installs the SDK but it's only a tiny bit "outdated"). Now, you may not need the Windows SDK, but I installed it prior to working with CUDA in VS, so I'm not at all sure of this. However, it's all full of goodies, so why the heck not.

After all the installations are done, enter VS 2010 and

Step #1 - Create an empty project

Create a new Visual C++ Empty Project, give it a name, click OK. This step isn't so hard, is it? :-)

Step #2 - Specify build customizations

In Solution Explorer, right click your project and choose 'Build Customizations'. If all is well, you should see a screen that kind of looks like this:

Vs2010_cuda_build_custom
Select the CUDA 4.0 customization files and click OK.

Step #3 - Hello, VS 2010 World!

We're ready to write some code! Right click Source Files under Solution Explorer and add a new item. Make it a C++ file but give it the .cu suffix.

#include <cuda.h> #include <stdio.h> __global__ void helloWorld(char* str) {     // determine where in the thread grid we are     int idx = blockIdx.x * blockDim.x + threadIdx.x;     // unmangle output     str[idx] += idx; } int main(int argc, char** argv) {     int i;     char str[] = "Hello World!";     for(i = 0; i < 12; i++)         str[i] -= i;     // allocate memory on the device     char *d_str;     size_t size = sizeof(str);     cudaMalloc((void**)&d_str, size);     cudaMemcpy(d_str, str, size, cudaMemcpyHostToDevice);     dim3 dimGrid(2);   // one block per word      dim3 dimBlock(6); // one thread per character     helloWorld<<< dimGrid, dimBlock >>>(d_str);     cudaMemcpy(str, d_str, size, cudaMemcpyDeviceToHost);     cudaFree(d_str);     printf("%s\n", str);     return 0; }

Our "Hello, World!" program for the evening adds up its "rank" (if you're familiar with MPI terms) to its assigned character in the string, and then prints it. Nothing too fancy. Save it.

Step #4 - Building and linking

This is the only real trick to the whole process. The CUDA toolkit + Windows SDK installations must have already set your paths and environment correctly. The only thing missing from the story is the cudart.lib library - if you try building at this point, VS will complain about _cudaFree, _cudaMalloc, etc. and not being able to resolve them. The cudart.lib library is in $(CUDA_LIB_PATH), and here's how we tell the VS linker: In Solution Explorer, right click your project, select Properties, then Linker, then General. Verify that the field Additional Library Directories contains either $(CUDA_LIB_PATH) or $(CudaToolkitLibDir); if it does, you're golden, if not add them yourself. Then proceed to the Input tab, and add cudart.lib in the Additional Dependencies field.

Step #5 - Go forth and compute!

You are all set. Execute your "Hello, World!" program and have fun working with CUDA 4!

I will update this post with details about the CUDA SDK and Nsight in the coming days.

Posted via email from brainstorming