C (programming language): Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Pat Palmer
mNo edit summary
 
(42 intermediate revisions by 8 users not shown)
Line 1: Line 1:
{{subpages}}
{{subpages}}
{{dambigbox|text=For other uses, see [[C (disambiguation)]].}}
{{dambigbox|text=For other uses, see [[C (disambiguation)]].}}
'''C''' is a general-purpose, procedural [[computer]] [[programming language]] which is still in use more than thirty years after its creation.  '''C''' was developed in 1972 by [[Dennis Ritchie]] and [[Brian Kernighan]] (then of [[Bell Laboratories]]) for use with the [[Unix]] operating system, and, before it entered the standards process, in a reference book.<ref name="K&R">{{citation
{{TOC|right}}
 
'''C''' is a general-purpose, procedural [[computer]] [[programming language]] which is still in regular use over four decades after its creation.  It was developed in 1972 by [[Dennis Ritchie]] and [[Brian Kernighan]] (then of [[Bell Laboratories]]) for use with the [[Unix]] operating system, and was documented, before it entered the standards process, in a 1978 reference book commonly referred to as the ''Kernighan and Ritchie'' book, or just K&R.<ref name="K&R">{{citation
  |last1=Kernighan | first1=B. |last2=Ritchie |first2= D.
  |last1=Kernighan | first1=B. |last2=Ritchie |first2= D.
  | title = The C Programming Language
  | title = The C Programming Language
  | publisher = Prentice Hall | year = 1978}}</ref>
  | publisher = Prentice Hall | year = 1978}}</ref> The language has been implemented for many different computer platforms and eventually became standardized by [[ANSI]] and [[ISO]]--there is a Second Edition of K&R using the ANSI standard syntax. <ref name=KR2>{{citation
. The language has been implemented for many different computer platforms and eventually became standardized by ANSI and ISO.
| title = C Programming Language (2nd Edition)
|last1=Kernighan | first1=B. |last2=Ritchie |first2= D.
| publisher = Prentice Hall
| year = 1988}}</ref>


Although superceded by more modern languages for general application programming, as of 2009 versions of C are still used, primarily for writing [[operating system]] software and [[embedded]] programs (for gadgets such as smart phones).  C and its closely related sister language, C++, are also used for games development and other graphics- or media-intensive programming.  Although once used for web programming, it has been superceded for web programming by newer languages that provide more security and which enforce safer programming practices.
Although superseded by more modern languages for general application programming, since 2010 versions of C are still being used, primarily for writing [[operating system]] software and [[embedded]] programs (for gadgets such as smart phones).  C and its closely related sister language, [[C%2B%2B|C++]], are also used for games development and other graphics- or media-intensive programming.  Although once used for web programming, it has been superseded for web programming by newer languages that provide more security and which enforce safer programming practices.


==Historical influence of C==
==Historical influence of C==


The C programming language set several important precedents which were adopted by many subsequent programming languages.  At the time of its initial implementation, most or all programming tools did not allow recursive function calls.  C, in conjunction with UNIX, takes advantage of two run-time memory-management strategies: the "call stack" (used by the compiler to keep track of local variables declared in functions and subroutines) and the "heap" (used explicitly by the programmer to pack arbitrary data structures into memory).   
The C programming language set several important precedents which were adopted by many subsequent programming languages.  At the time of its initial implementation, most or all programming tools did not allow recursive function calls.  C, in conjunction with UNIX, takes advantage of two run-time memory-management strategies: the [[Stack_frame|call stack]] (used by the compiler to keep track of local variables declared in functions and subroutines) and the [[heap]] (used explicitly by the programmer to pack arbitrary [[Data_structure|data structures]] into [[memory]]).  The use of the stack meant that C allows functions to call themselves (so-called [[recursion]]); certain simple problems may be elegantly solved using a [[recursive function]].


The use of the stack meant that C allows functions to call themselves (so-called "recursion").  The use of the heap means that programmers can define arbitrarily complex data types.  The stack and heap resulted in a powerful programming paradigm which is still in widespread use today. Today, most computer architectures, operating systems, and compilers provide special capabilities facilitating the use of stacks and heaps by programming languages such as C.
The use of the heap means that programmers can define arbitrarily complex [[data types]]However, the C heap also needs to have memory reclaimed explicitly by the programmer after data structures are no longer needed.   In longer programs, reclaiming heap memory, also called [[garbage collection]], is one of the aspects of C programming that is difficult and error prone.  Failure to correctly reclaim all no-longer-needed memory from the heap eventually results in a program running out of memory, a situation called a [[memory leak]].  More modern languages such as Java and C# reclaim no-longer-used heap space automatically, resulting generally in more reliable programs.


Although powerful, the heap as used by C also needs to have memory reclaimed explicitly by the programmer after data structures are no longer needed; this error-prone process, called "garbage collection", is one of the aspects of C programming that is difficult.  Failure to correctly reclaim all no-longer-needed memory from the heap eventually results in a program running out of memory, a situation called a "memory leak".  More modern languages such as Java and C# handled reclamation of no-longer-used heap space automatically, resulting generally in more reliable programs.
The stack and heap resulted in a powerful programming paradigm which is still in widespread use today.  Today, most [[computer architecture]]s, [[operating systems]], and [[compiler]]s provide special capabilities facilitating the use of stacks and heaps by programming languages such as C.  For example, operating systems typically contain calls to "allocate an initial heap" and then, later, to "grow the heap" using vacant memory if the current heap has filled up; however, such calls will eventually fail if heap space is not being reclaimed properly.


C's name derives from those of two of its predecessors (also worked on by the same authors at Bell Laboratories), '''Basic Computer Programming Language (BCPL)'''<ref name=BCPL>{{citation  
C got its name by being the successor to B. '''B'''<ref name=B>{{citation
| title =Users' Reference to B
| first = Ken | last = Thompson
| date = 7 January 1972
| url = http://cm.bell-labs.com/cm/cs/who/dmr/kbman.html}}</ref> was a language developed at Bell Labs, a derivative of '''Basic Combined Programming Language (BCPL)''' <ref name=BCPL>{{citation  
  |first= Martin | last = Richards
  |first= Martin | last = Richards
  | title = Richards's BCPL Reference Manual
  | title = Richards's BCPL Reference Manual
  | date = 21 July 1967
  | date = 21 July 1967
  | id = Memorandum M-352 of MIT Project MAC
  | id = Memorandum M-352 of MIT Project MAC
  | url = http://www.cs.bell-labs.com/who/dmr/bcpl.html}}</ref>, and a BCPL variant,'''B'''<ref name=B>{{citation
  | url = http://www.cs.bell-labs.com/who/dmr/bcpl.html}}</ref>, developed by a Cambridge professor while visiting MIT. BCPL in turn was based on CPL (''Combined Programming Language'' or "Cambridge Plus London"), but removed some features to allow faster and smaller compilers.<ref>{{citation
| title =Users' Reference to B
| author = Dennis M Ritchie
| first = Ken | last = Thompson
| title = The Development of the C Language
| date = 7 January 1972
| url = http://cm.bell-labs.com/cm/cs/who/dmr/chist.html
| url = http://cm.bell-labs.com/cm/cs/who/dmr/kbman.html}}</ref>.
| date = 1993
}}</ref>
 
BCPL and B are rather minimal languages intended for tasks like implementing compilers; they had only one data type, the machine word. The main things C adds to B are a variety of basic types (char, short, int, long, float, double, pointers) and methods of combining them (arrays and structs).
 
C adopted syntax from various older programming languages. Semi-colons, recursive function calls and the block structure are from [[Algol]]. The  ''+='' and similar notation are from Algol 68. Curly brackets {} to delimit blocks were copied from BCPL.  BCPL also provided one-line comments beginning with //, which were not used in the original C, but are in C++ and standardized C.


==Syntax of the C family of programming languages==
==Syntax of the C family of programming languages==


The syntax and stack-based scope behavior of C were also adopted by several later programming languages, including [[C++]], [[Java]], and [[C sharp|C#]].  The adopted syntax characteristics tend to include case sensitivity, ending of statements with a semi-colon (;), use of {  } to enclose blocks of code, enclosing procedure parameters inside (  ) pairs, and allowing temporary variables to be declared inside blocks or procedures (which are semantically destroyed subsequent to that procedure's execution, and reinitialized from scratch during subsequent procedure executions).   
Although C's creator, Dennis Ritchie, didn’t invent the curly-bracket syntax himself (that came from Martin Richards’ BCPL), C's syntax had enormous influence on later languages.
Both the C language syntax and stack-based scope behavior of C were also adopted by several later programming languages, including [[C++]], [[Java]], [[JavaScript]] and [[C sharp|C#]].  The adopted syntax characteristics tend to include case sensitivity, ending of statements with a semi-colon (;), use of {  } to enclose blocks of code, enclosing procedure parameters inside (  ) pairs, and allowing temporary variables to be declared inside blocks or procedures (which are semantically destroyed subsequent to that procedure's execution, and reinitialized from scratch during subsequent procedure executions).   


[[Javascript]] uses similar syntax but has different scope rules.
[[JavaScript]] uses similar syntax but has different scope rules. Quite a few other programming languages adopt many of the same syntax rules (such as semi-colon after statements, and curly braces to enclose code blocks) but have significant semantic differences. 


===Hello World===
===Hello World===
Line 38: Line 53:
  #include <stdio.h>
  #include <stdio.h>
   
   
  int main(void) {
  int main() {
     printf("Hello, world!\n");
     printf("Hello, world!\n");
     return 0;
     return 0;
Line 49: Line 64:
<code>#include <stdio.h></code> tells the [[precompiler]] to include the contents of the header file stdio.h, which declares '''st'''andar'''d''' '''i'''nput and '''o'''utput functions into the program before compiling.
<code>#include <stdio.h></code> tells the [[precompiler]] to include the contents of the header file stdio.h, which declares '''st'''andar'''d''' '''i'''nput and '''o'''utput functions into the program before compiling.


<code>int main(void) {</code> tells the compiler that there is a [[function]] named <code>main</code> which expects no [[parameter]]s (<code>void</code>) and will return an integer number to the [[caller]] (<code>int</code>). Due to a standard convention of the language, <code>main</code> is the first function called after the [[execution environment]] of the program has been set up. The opening curly brace following <code>int main(void)</code> denotes the beginning of the function.
<code>int main() {</code> tells the compiler that there is a [[function]] named <code>main</code> which expects no [[parameter]]s and will return an integer number to the [[caller]] (<code>int</code>). Due to a standard convention of the language, <code>main</code> is the first function called after the [[execution environment]] of the program has been set up. The opening curly brace following <code>int main()</code> denotes the beginning of the function.


<code>printf("Hello, world!\n");</code> will make the program output <code>Hello, world!</code> and a new line (<code>\n</code>) on the screen. <code>printf</code> is itself a function similar to <code>main</code> but predefined in a library (libc) and [[linker|linked]] into the program at compile time or runtime. The trailing semicolon is the end of statement marker in C.
<code>printf("Hello, world!\n");</code> will make the program output <code>Hello, world!</code> and a new line (<code>\n</code>) on the screen. <code>printf</code> is itself a function similar to <code>main</code> but predefined in a library (libc) and [[linker|linked]] into the program at compile time or runtime. The trailing semicolon is the end of statement marker in C.
Line 57: Line 72:
<code>}</code> signals the end of the function definition to the compiler.
<code>}</code> signals the end of the function definition to the compiler.


==Pros & Cons of C==
==Advantages and disadvantages of C==
 
===Pros===
 
*Although C programs can either be compiled or run through an interpreter, most are compiled and run as native code.  Compiled C programs tend to start running faster than languages requiring a runtime or interpreter; however, modern runtime-based languages such as Java, C# or VB.NET now run about as fast as native code once they have "warmed up" (i.e., once they have been just-in-time compiled and loaded into memory).
 
*C is low-level enough to take advantage of hardware-specific capabilities, and thus any low=level software packages that have close interaction with hardware are written in C. For example, most of [[Unix]] is written in C. Most [[compiler]]s and [[interpreter]]s are written in C. C gives programmers access to hardware, enabling them to manipulate [[memory]]. This feature is advantageous but can prove problematic if misused.
 
*C has been standardized, which minimizes differences across compilers, but it is still hardware-dependent.  For example, the storage width of integers depends on the underlying hardware and may differ across two different hardware architectures.  C programs can be made portable across multiples systems, but only with some diligence on the part of the programmer. C [[compiler]]s are available for most systems.
 
===Cons===
 
* C programmers must explicitly managed how their program uses heap memory. Although allowing great flexibility, pointers (addresses of data structures in the heap) can make C code very hard to understand and debug.  And forgetting to deallocated used memory after it is no longer needed will cause programs to crash after they have been running for awhile (because the heap fills up).


* Strings in C are implemented as special data structures.  Parts of the extensive string-handling libraries in C do not always enforce correct memory management practices. Programmer errors may go undetected at compile-time and cause crashes or other problems at runtime. Programmer errors may also make a program vulnerable to malicious exploitation of the entire computer system.  See the [[buffer overflow]] article for some examples.
C is low-level in its approach, meaning that the programmer has control over memory management, hardware interactions and that the hardware is visible from inside the program to a larger degree than in many more high-level languages. This has the advantage that well-written C programs can be very effective with machine resources since the programmer strictly decides what is necessary and what is not, making extensive profiling possible. C is one of the preferred languages for memory- and processor-intensive programs such as video editors, 3d games and operating system [[kernel|kernels]]. The downside is that programmers must be more aware of the hardware, leading to potentially longer development and debugging time and reduced portability.  Because C hides so little of the hardware from programmers, using C for code is sometimes referred to as ''programming to the metal'' or ''close to the metal''.


* The low level nature of C, which does not hide specific hardware architectural details, makes it relatively harder to learn than other programming languages.
Some programmers have the habit of prototyping functions and features in more high-level languages before coding them in C, since that will allow them to see if an idea is workable before taking the time to write a (presumably) more effective program in C.


* Even with programmer diligence, it is still difficult to guarantee that a C program which runs on one type of hardware will also run identically on different hardware without modification.
Many think of the C standard library (the set of functions which is guaranteed to be available on all C platforms) as being compact in its design. C and its standard library have been standardized, which minimizes differences across compilers. Some parts of a system still differ, and programmers must be careful not to assume knowledge about the target system if they desire portable software. For example, the storage width of integers depends on the underlying hardware and may differ across two different hardware architectures. As such, a program which assumes 64-bit integers might have problems running unmodified on a 32-bit processor. C programs can be made portable across multiple systems, but only with some diligence on the part of the programmer. C [[compiler]]s are available for most systems.


==See also==
==See also==
Line 86: Line 89:


==References==
==References==
<references />
<references />[[Category:Suggestion Bot Tag]]

Latest revision as of 16:00, 23 July 2024

This article is developing and not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
Tutorials [?]
 
This editable Main Article is under development and subject to a disclaimer.
For other uses, see C (disambiguation).

C is a general-purpose, procedural computer programming language which is still in regular use over four decades after its creation. It was developed in 1972 by Dennis Ritchie and Brian Kernighan (then of Bell Laboratories) for use with the Unix operating system, and was documented, before it entered the standards process, in a 1978 reference book commonly referred to as the Kernighan and Ritchie book, or just K&R.[1] The language has been implemented for many different computer platforms and eventually became standardized by ANSI and ISO--there is a Second Edition of K&R using the ANSI standard syntax. [2]

Although superseded by more modern languages for general application programming, since 2010 versions of C are still being used, primarily for writing operating system software and embedded programs (for gadgets such as smart phones). C and its closely related sister language, C++, are also used for games development and other graphics- or media-intensive programming. Although once used for web programming, it has been superseded for web programming by newer languages that provide more security and which enforce safer programming practices.

Historical influence of C

The C programming language set several important precedents which were adopted by many subsequent programming languages. At the time of its initial implementation, most or all programming tools did not allow recursive function calls. C, in conjunction with UNIX, takes advantage of two run-time memory-management strategies: the call stack (used by the compiler to keep track of local variables declared in functions and subroutines) and the heap (used explicitly by the programmer to pack arbitrary data structures into memory). The use of the stack meant that C allows functions to call themselves (so-called recursion); certain simple problems may be elegantly solved using a recursive function.

The use of the heap means that programmers can define arbitrarily complex data types. However, the C heap also needs to have memory reclaimed explicitly by the programmer after data structures are no longer needed. In longer programs, reclaiming heap memory, also called garbage collection, is one of the aspects of C programming that is difficult and error prone. Failure to correctly reclaim all no-longer-needed memory from the heap eventually results in a program running out of memory, a situation called a memory leak. More modern languages such as Java and C# reclaim no-longer-used heap space automatically, resulting generally in more reliable programs.

The stack and heap resulted in a powerful programming paradigm which is still in widespread use today. Today, most computer architectures, operating systems, and compilers provide special capabilities facilitating the use of stacks and heaps by programming languages such as C. For example, operating systems typically contain calls to "allocate an initial heap" and then, later, to "grow the heap" using vacant memory if the current heap has filled up; however, such calls will eventually fail if heap space is not being reclaimed properly.

C got its name by being the successor to B. B[3] was a language developed at Bell Labs, a derivative of Basic Combined Programming Language (BCPL) [4], developed by a Cambridge professor while visiting MIT. BCPL in turn was based on CPL (Combined Programming Language or "Cambridge Plus London"), but removed some features to allow faster and smaller compilers.[5]

BCPL and B are rather minimal languages intended for tasks like implementing compilers; they had only one data type, the machine word. The main things C adds to B are a variety of basic types (char, short, int, long, float, double, pointers) and methods of combining them (arrays and structs).

C adopted syntax from various older programming languages. Semi-colons, recursive function calls and the block structure are from Algol. The += and similar notation are from Algol 68. Curly brackets {} to delimit blocks were copied from BCPL. BCPL also provided one-line comments beginning with //, which were not used in the original C, but are in C++ and standardized C.

Syntax of the C family of programming languages

Although C's creator, Dennis Ritchie, didn’t invent the curly-bracket syntax himself (that came from Martin Richards’ BCPL), C's syntax had enormous influence on later languages. Both the C language syntax and stack-based scope behavior of C were also adopted by several later programming languages, including C++, Java, JavaScript and C#. The adopted syntax characteristics tend to include case sensitivity, ending of statements with a semi-colon (;), use of { } to enclose blocks of code, enclosing procedure parameters inside ( ) pairs, and allowing temporary variables to be declared inside blocks or procedures (which are semantically destroyed subsequent to that procedure's execution, and reinitialized from scratch during subsequent procedure executions).

JavaScript uses similar syntax but has different scope rules. Quite a few other programming languages adopt many of the same syntax rules (such as semi-colon after statements, and curly braces to enclose code blocks) but have significant semantic differences.

Hello World

#include <stdio.h>

int main() {
   printf("Hello, world!\n");
   return 0;
}

Analysis of the example

The Hello World program (see above) appears in many programming languages books and articles as a cursory introduction into a language's syntax. It was introduced in the book The C Programming Language[1].

#include <stdio.h> tells the precompiler to include the contents of the header file stdio.h, which declares standard input and output functions into the program before compiling.

int main() { tells the compiler that there is a function named main which expects no parameters and will return an integer number to the caller (int). Due to a standard convention of the language, main is the first function called after the execution environment of the program has been set up. The opening curly brace following int main() denotes the beginning of the function.

printf("Hello, world!\n"); will make the program output Hello, world! and a new line (\n) on the screen. printf is itself a function similar to main but predefined in a library (libc) and linked into the program at compile time or runtime. The trailing semicolon is the end of statement marker in C.

return 0; defines the value to be returned from main and leaves the function back to its caller, some standard C startup code. After some additional cleanup that code will pass the 0 on to the operating system, to which it means 'success'.

} signals the end of the function definition to the compiler.

Advantages and disadvantages of C

C is low-level in its approach, meaning that the programmer has control over memory management, hardware interactions and that the hardware is visible from inside the program to a larger degree than in many more high-level languages. This has the advantage that well-written C programs can be very effective with machine resources since the programmer strictly decides what is necessary and what is not, making extensive profiling possible. C is one of the preferred languages for memory- and processor-intensive programs such as video editors, 3d games and operating system kernels. The downside is that programmers must be more aware of the hardware, leading to potentially longer development and debugging time and reduced portability. Because C hides so little of the hardware from programmers, using C for code is sometimes referred to as programming to the metal or close to the metal.

Some programmers have the habit of prototyping functions and features in more high-level languages before coding them in C, since that will allow them to see if an idea is workable before taking the time to write a (presumably) more effective program in C.

Many think of the C standard library (the set of functions which is guaranteed to be available on all C platforms) as being compact in its design. C and its standard library have been standardized, which minimizes differences across compilers. Some parts of a system still differ, and programmers must be careful not to assume knowledge about the target system if they desire portable software. For example, the storage width of integers depends on the underlying hardware and may differ across two different hardware architectures. As such, a program which assumes 64-bit integers might have problems running unmodified on a 32-bit processor. C programs can be made portable across multiple systems, but only with some diligence on the part of the programmer. C compilers are available for most systems.

See also

References

  1. 1.0 1.1 Kernighan, B. & D. Ritchie (1978), The C Programming Language, Prentice Hall
  2. Kernighan, B. & D. Ritchie (1988), C Programming Language (2nd Edition), Prentice Hall
  3. Thompson, Ken (7 January 1972), Users' Reference to B
  4. Richards, Martin (21 July 1967), Richards's BCPL Reference Manual, Memorandum M-352 of MIT Project MAC
  5. Dennis M Ritchie (1993), The Development of the C Language